[Solved-1 Solution] Generate multiple outputs with Hadoop Pig ?

Problem:

How to generate Multiple Output for multiple files loaded in PIG ?

Solution 1:

Why we need split ?

Split operator can be used to partition the contents of a relation into two or more relations based on some expression. Based on the conditions provided in the expression either of the below two will be done:

A tuple may be assigned to more than one relation
A tuple may not be assigned to any relation

Multiple files in a directory which are used in pig to load, flatten and store:

[user1@localhost ~]# ls /pigsamples/mfilesdata/
file1  file2  file3

Loading above directory:

grunt> input_data = LOAD '/pigsamples/mfilesdata' USING PigStorage (',') AS (f1:INT, f2:INT, f3:INT);
grunt> DUMP input_data;
(1,2,3)
(2,3,1)
(3,1,2)
(4,5,6)
(5,6,4)

Format the data based on your requirements

grunt> formatted_data = FOREACH input_data GENERATE FLATTEN(TOKENIZE($0));    //replace with your requirements

Use SPLIT operator to split the relation into multiple relations based on the conditions.

grunt> 
SPLIT formatted_data 
INTO split1 IF f1 <= 3, 
split2 IF (f1 > 3 AND f1 <= 6), 
split3 IF f1 > 6;       //split based on the column which is unique within all the files

Output:

grunt> DUMP split1;
(1,2,3)
(2,3,1)
(3,1,2)

grunt> DUMP split2;
(4,5,6)
(5,6,4)
(6,4,5)

Apache Pig Basics

Apache Pig - Filtering

Apache Pig - Operators

Apache Pig - Functions

Eval Functions

Bag-Tuple Functions

DateTime Function

User Defined Function

Load-store Function

Math-function

Apache Pig- Regex

Apache Pig - Running Scripts

Apache pig - Execution

Apache Pig - How to

[Solved-1 Solution] Generate multiple outputs with Hadoop Pig ?

Problem:

Solution 1:

Why we need split ?

Related Searches to Generate multiple outputs with Hadoop Pig

Wikitechy

Workshop

Join our Community

Other Languages

[Solved-1 Solution] Generate multiple outputs with Hadoop Pig ?

Problem:

Solution 1:

Why we need split ?

Related Searches to Generate multiple outputs with Hadoop Pig

Summer Offline Internship

Summer Online Internship

Internship in Chennai

Programming / Technology Internship in Chennai

Wikitechy

Workshop

Join our Community

Other Languages