pig tutorial - apache pig tutorial - Apache Pig COUNT() Function - pig latin - apache pig - pig hadoop
What is Count Function in Apache Pig ?
- The COUNT() function used in Apache Pig is used to get the number of elements in a bag.
- The COUNT() function ignores all the tuples which is having a NULL value in the first field while counting the number of tuples given in a bag
- The COUNT() function returns the number of rows that matches a specified criteria.
- The COUNT() function returns the number of values which is given in a set of values.
- The COUNT() function counts the number of cells that contain numbers, and also counts the numbers which are written within the list of arguments.

Learn Apache Pig - Apache Pig tutorial - Apache Pig Count() Function - Apache Pig examples - Apache Pig programs
Pig Operations - Aggregation

Syntax
grunt> COUNT(expression)
Example
wikitechy_employee_details.txt
001,Hansika,Reddy,21,9848022337,Hyderabad,89
002,Aysha,Battacharya,22,9848022338,Kolkata,78
003,Swaminathan,Khanna,22,9848022339,Delhi,90
004,Preethi,Agarwal,21,9848022330,Pune,93
005,Sruti,Mohanthy,23,9848022336,Bhuwaneshwar,75
006,Karishma,Mishra,23,9848022335,Chennai,87
007,Kamala,Nayak,24,9848022334,trivendram,83
008,Krish,Nambiayar,24,9848022333,Chennai,72
- We have loaded the into Pig with the relation name wikitechy_employee_details which is given below:
grunt> employee_details = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_employee_details.txt' USING PigStorage(',')
as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray, gpa:int);
Calculating the Number of Tuples
- We can use the built-in function COUNT() to calculate the number of tuples which is given in a relation.
- We need to group the relation wikitechy_employee_details using the Group All operator, and store the result in the relation name employee_group_all which is given below:
grunt> employee_group_all = Group wikitechy_employee_details All;
- It will produce a relation for calculating the number of tuples which is given below:
- We can calculate number of tuples and records which is given in the relation.
grunt> employee_count = foreach employee_group_all Generate COUNT(wikitechy_employee_details.gpa);
Verification
grunt> Dump employee_count;
Output
8