Pig datatypes could be categorized into following two categories:

  1. Scalar/Simple
  2. Complex

Scalar Types

Data Type Description Example
int 4-byte signed integer. Mapped to java.lang.Integer 10
long 8-byte signed integer. Mapped to java.lang.Long 10L
float 4-byte floating point number. Mapped to java.lang.Float 10.5F or 1050.0F
double 8-byte double number. Mapped to java.lang.Double 10.5 or 1050.0
chararray Array of character. Mapped to java.lang.String hello world
bytearray Array of bytes. Wraps a java byte[] array
boolean Boolean values true/false
datetime Datetime values

Complex Types

Data Type Description Example
map Set of key-value pairs [‘name’#’john’, ‘age’#30,’id’#3783]
tuple Ordered collection of elements (john, 30, 3783)
bag Unordered collection of tuples {(john, 30, 3783),(kate, 29, 4121)}

Map:

  1. A map in Pig is a chararray to data element mapping, where that element can be any Pig type, including a complex type.
  2. The chararray is called a key and is used as an index to find the element, referred to as the value.
  3. As pig is not aware of the return type of the value, it will assume it as of type bytearray. If you know the type then you could cast the value to that particular pig type or pig smartly convert it to matched pig type at runtime by checking for the value usage in the script.

Map type is formed by enclosing the data in square parenthesis and putting a hash symbol between key and value as: [‘name’#’john’, ‘age’#30,’id’#3783]

Tuple:

  1. A tuple is a fixed-length, ordered collection of Pig data elements.
  2. Tuples are divided into fields, with each field containing one data element.
  3. These elements can be of any type—they do not all need to be the same type.
  4. A tuple is analogous to a row in SQL, with the fields being SQL columns.
  5. Because tuples are ordered, it is possible to refer to the fields by position. A tuple can, but is not required to, have a schema associated with it that describes each field’s type and provides a name for each field.
  6. Tuple type is enclosed with parentheses and the element values are separated with a comma.

For ex. (john, 30, 3783) is a tuple with three fields.

Bag:

  1. A bag is an unordered collection of tuples.
  2. It has no order, it is not possible to reference tuples in a bag by position.
  3. Bag types are created by enclosing various comma separated tuples in curly parenthesis.

For example, {(john, 30, 3783),(kate, 29, 4121)} constructs a bag with two tuples, each with three fields.


Related Posts

Installing Pig

Share this:

Leave a Reply

Your email address will not be published. Required fields are marked *