pig tutorial - apache pig tutorial - Apache Pig Distinct Operator - pig latin - apache pig - pig hadoop
What is Distinct Operator in Apache Pig ?
- The DISTINCT Operator is used to remove duplicated records and it works only on entire records, which does not work on individual fields.
- The DISTINCT operators which are used in a SELECT statement filter the result set to remove duplicates
- We can use DISTINCT operator in combination with an aggregation function, which is typically COUNT ().
- The distinct operator is used to get the unique values by removing duplicates.
- The DISTINCT operator is used to remove redundant tuples from a relation.
Pig Operations - Deduplication
- Only preserves unique tuples
Syntax
Example:
wikitechy_student_details.txt
- And we have loaded this file into Pig with the relation name wikitechy_student_details which is given below:
- We remove the redundant tuples from the relation which is name wikitechy_student_details using the DISTINCT operator, and store it as another relation which is called distinct_data which is given below:
- We remove the redundant tuples from the relation which is name wikitechy_student_details using the DISTINCT operator, and store it as another relation which is called distinct_data which is given below: