pig tutorial - apache pig tutorial - Apache Pig Cogroup Operator - pig latin - apache pig - pig hadoop
What is COGROUP operator in Apache Pig ?
- The COGROUP operator is similar to works on the GROUP operator.
- The only difference between the two operators is that the group operator is normally used with one relation, while the cogroup operator is used in statements involving two or more relations.
Grouping Two Relations using Cogroup
- Ensure that we have two files namely student_details.txt and wikitechy_employee_detai ls.txt in the HDFS directory /pig_data/ as given below.
student_details.txt
111,Anu,Shankar,23,9876543210,Chennai
112,Barvathi,Nambiayar,24,9876543211,Chennai
113,Kajal,Nayak,24,9876543212,Trivendram
114,Preethi,Antony,21,9876543213,Pune
115,Raj,Gopal,21,9876543214,Hyderabad
116,Yashika,Kannan,22,9876543215,Delhi
117,siddu,Narayanan,22,9876543216,Kolkata
118,Timple,Mohanthy,23,9876543217,Bhuwaneshwar
Wikitechy_employee_details.txt
111,Robert,22,newyork
112,Bastin,23,Kolkata
113,Martin,23,Tokyo
114,Sangavi,25,London
115,David,23,Bhuwaneshwar
116,Arnold,22,Chennai
- You have loaded these files into Pig with the relation names student_details and wikitechy_employee_details respectively, as given below.
grunt> student_details = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING PigStorage(',')
as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray);
grunt> wikitechy_employee_details = LOAD 'hdfs://localhost:9000/pig_data/employee_details.txt' USING PigStorage(',')
as (id:int, name:chararray, age:int, city:chararray);
- You have loaded these files into Pig with the relation names student_details and wikitechy_employee_details respectively, as given below.
- Now group the records/tuples of the relations student_details and employee_details with the key age, as given below.
grunt> cogroup_data = COGROUP student_details by age, employee_details by age;
Verification
- Now verify the relation cogroup_data using the DUMP operator as given below.
grunt> Dump cogroup-data;
Output
- The output, displaying the contents of the relation named cogroup_data as given below.
(21,{(114,Preethi,Antony,21,9876543213,Pune),(115, Raj,Gopal,21,9876543214,Hyderabad)})
(22,{(116,Yashika,Kannan,22,9876543215,Delhi),(117,siddu,Narayanan,22,9876543216,Kolkata)})
(23,{(111,Anu,Shankar,23,9876543210,Chennai),(118,Timple,Mohanthy,23,9876543217,Bhuwaneshwar)})
(24,{(112,Barvathi,Nambiayar,24,9876543211,Chennai),(113,Kajal,Nayak,24,9876543212,Trivendram)})
(25,{ }, {(114,Sangavi,25,London )})
- The cogroup operator groups the tuples from each relation according to age where each group depicts a particular age value.
Example
- If we consider the 1st tuple of the result, it is grouped by age 21. And it contains two bags,
- The first bag holds all the tuples from the first relation (student_details in this case) having age 21.
- The second bag contains all the tuples from the second relation (wikitechy_employee_details in this case) having age 21.
- In case a relation doesn’t have tuples having the age value 21, it returns an empty bag.

Learn Apache Pig - Apache Pig tutorial - Apache Pig Cogroup Operator - Apache Pig examples - Apache Pig programs