pig tutorial - apache pig tutorial - Apache Pig - Filter Operator - pig latin - apache pig - pig hadoop
What is Filter Operator in Apache Pig ?
- Filter operator is a simple and a powerful operation which is given in Apache Pig.
- Filter operator filters only the desired data out of huge chunk of data and then it process business logic in parallel which is much faster when compared to filtering the data and running business logic on the full volume data.
- The filter operator which is used in pig is used to remove unwanted records from the data file.
- The filter operator is used to select the required tuples from a relation which is done based on the condition.
- Filter operator allows us to remove unwanted records based on a condition.
- Generate a new relation by filtering data on a relation

Syntax
grunt> Relation2_name = FILTER Relation1_name BY (condition);
Example:
wikitechy_student_details.txt
001,Suresh,Reddy,21,9848022337,Hyderabad
002,harish,Battacharya,22,9848022338,Kolkata
003,Fathima,Khanna,23,9848022339,Delhi
004,Preethi,Agarwal,21,9848022330,Pune
005,Vanitha,Mohanthy,24,9848022336,Bhuwaneshwar
006,Sruti,Mishra,25,9848022335,Chennai
007,Kamal,Nayak,26,9848022334,trivendram
008,Barath,Nambiayar,27,9848022333,Chennai
- We have loaded the file into Pig with the relation name wikitechy_student_details which is given below:
grunt> wikitechy_student_details = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_student_details.txt' USING PigStorage(',')
as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray);
- Now we need to filter the data by using the Filter operator to get the details of the students who belong to the city Chennai.
filter_data = FILTER wikitechy_student_details BY city == 'Chennai';
Verification:
grunt> Dump filter_data;
Output:
(6,Sruti,Mishra,23,9848022335,Chennai)
(8,Barath,Nambiayar,24,9848022333,Chennai)