pig tutorial - apache pig tutorial - Apache Pig - SIZE() Function - pig latin - apache pig - pig hadoop
What is SIZE() Function in Apache Pig ?
- The SIZE() function used in Apache Pig() is used to compute the number of elements based on any Pig data type.
- The SIZE() function includes all the NULL values in the size computation
- The SIZE() function are shape descriptors, in a geometrical and topological sense
- The SIZE() function are the functions from the half-plane x < y {\displaystyle x
- The SIZE() Function is counting certain connected components of a topological space and they are used in techniques like pattern recognition and topology.
Syntax
grunt> SIZE(expression)
- The table which is given below gives the return values which vary according to the data types and their values in Apache Pig.
Data type | Value |
---|---|
int, long, float, double | For all these types, the size function returns 1. |
Char array | For a char array, the size() function returns the number of characters in the array. |
Byte array | For a bytearray, the size() function returns the number of bytes in the array. |
Tuple | For a tuple, the size() function returns number of fields in the tuple. |
Bag | For a bag, the size() function returns number of tuples in the bag. |
Map | For a map, the size() function returns the number of key/value pairs in the map. |
Example
<b>wikitechy_employee.txt</b>
1,Joseph,2007-01-24,250
2,John,2007-05-27,220
3,Patrick,2007-05-06,170
3,Patrick,2007-04-06,100
4,Mill,2007-04-06,220
5,Sarah,2007-06-06,300
5,Sarah,2007-02-06,350
We have loaded this file into Pig with the relation name called employee_data as given below.
grunt> employee_data = LOAD 'hdfs://localhost:9000/pig_data/ wikitechy_employee.txt' USING PigStorage(',')
as (id:int, name:chararray, workdate:chararray, daily_typing_pages:int);
Calculating the Size of the Type
Now, we need to calculate the size of the name type which is given below:
grunt> size = FOREACH employee_data GENERATE SIZE(name);
Verification.
grunt> Dump size;
Output
(4)
(3)
(4)
(4)
(4)
(4)
(4)