pig tutorial - apache pig tutorial - Apache Pig Cogroup Operator - pig latin - apache pig - pig hadoop
What is COGROUP operator in Apache Pig ?
- The COGROUP operator is similar to works on the GROUP operator.
- The only difference between the two operators is that the group operator is normally used with one relation, while the cogroup operator is used in statements involving two or more relations.
Grouping Two Relations using Cogroup
- Ensure that we have two files namely student_details.txt and wikitechy_employee_detai ls.txt in the HDFS directory /pig_data/ as given below.
student_details.txt
Wikitechy_employee_details.txt
- You have loaded these files into Pig with the relation names student_details and wikitechy_employee_details respectively, as given below.
- You have loaded these files into Pig with the relation names student_details and wikitechy_employee_details respectively, as given below.
- Now group the records/tuples of the relations student_details and employee_details with the key age, as given below.
Verification
- Now verify the relation cogroup_data using the DUMP operator as given below.
Output
- The output, displaying the contents of the relation named cogroup_data as given below.
- The cogroup operator groups the tuples from each relation according to age where each group depicts a particular age value.
Example
- If we consider the 1st tuple of the result, it is grouped by age 21. And it contains two bags,
- The first bag holds all the tuples from the first relation (student_details in this case) having age 21.
- The second bag contains all the tuples from the second relation (wikitechy_employee_details in this case) having age 21.
- In case a relation doesn’t have tuples having the age value 21, it returns an empty bag.
Learn Apache Pig - Apache Pig tutorial - Apache Pig Cogroup Operator - Apache Pig examples - Apache Pig programs