pig tutorial - apache pig tutorial - Apache Pig - PluckTuple() Function - pig latin - apache pig - pig hadoop
What is PluckTuple() Function in Apache Pig ?
- PluckTuple() which is used in Apache Pig is an regex pattern to pluck by
- We can use the function PluckTuple() after performing operations like join to differentiate the columns of the two schemas.
- We need to define a string Prefix and we need to filter for the columns in the relation that begin with the prefix.
- It will allow the user to specify a string prefix, and it will filter for the columns in a relation that begin match that give us the regex pattern.
- We can include flag 'false' to filter for the columns that do not match that prefix which is given for regex pattern.
Syntax
Example
- We can assume that we have two files namely wikitechy_employee_sales.txt and wikitechy_employee_bonus.txt in the HDFS directory /pig_data/.
wikitechy_employee_bonus.txt
- We have loaded these files into Pig, with the relation names called employee_sales and employee_bonus
employee_sales
employee_bonus
We need to join these two relations by using the join operator which is given below.
- We can verify the relation join_data by using the Dump operator which is given below:
Using PluckTuple() Function
- We need to define the required expression by which we want to differentiate the columns by using PluckTupe() function.
- We need to filter the columns in the join_data relation which is given below:
- We need to describe the relation named data by using the grunt operator which is given below: