pig tutorial - apache pig tutorial - Apache Pig Split Operator - pig latin - apache pig - pig hadoop
What is Split Operator Apache Pig ?
- The SPLIT operator is used to split a relation into two or more relations.
- The Split operator can be an operator within the reachability graph of a consistent region.
- The Split operator is configurable with a single input port. The input port is non-mutating and its punctuation mode is Oblivious Output Ports.
- The Split operator is configurable with one or more output ports.
- Splits a relation into multiple relations based on conditions
- SPLIT users into kids if age < 18, adults if age >= 18 and age <65, seniors otherwise;
- SPLIT data into testing if RANDOM() <= 0.10, training otherwise;<
- SPLIT operator cannot handle non deterministic functions (such as RANDOM).
Syntax
Example
Ensure that we have a file named wikitechy_employee_details.txt in the HDFS directory /pig_data/ as given below. wikitechy_employee_details.txt
- And we have loaded this file into Pig with the relation name wikitechy_employee_details as given below.
- Now split the relation into two, one listing the employees of age less than 23, and the other listing the employees having the age between 22 and 25.
Verification
Now verify the relations wikitechy_employee_details1 and wikitechy_employee_details2using the DUMP operator as shown below.
Output
- The following output, display the contents of the relations wikitechy_employee_details1 and wikitechy_employee _details2 respectively.