pig tutorial - apache pig tutorial - Apache Pig - Pig Storage() - pig latin - apache pig - pig hadoop
What is Pig Storage() in Apache Pig ?
- The PigStorage() function loads and stores data as structured text files.
- It takes a delimiter using which each entity of a tuple is separated as a parameter.
- By default, it takes â\tâ as a parameter.
Syntax
grunt> PigStorage(field_delimiter)
Example
- Let us suppose we have a file named wikitechy_employee_data.txt in the HDFS directory named /data/ with the following content.
111,Anu,Shankar,23,9876543210,Chennai
112,Barvathi,Nambiayar,24,9876543211,Chennai
113,Kajal,Nayak,24,9876543212,Trivendram
114,Preethi,Antony,21,9876543213,Pune
115,Raj,Gopal,21,9876543214,Hyderabad
116,Yashika,Kannan,22,9876543215,Delhi
117,siddu,Narayanan,22,9876543216,Kolkata
118,Timple,Mohanthy,23,9876543217,Bhuwaneshwar
- We can load the data using the PigStorage function as given below.
grunt> employee = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_employee_data.txt' USING PigStorage(',')
as ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );
- In the above example, we have seen that we have used comma (â,â) delimiter.
- Therefore, we have separated the values of a record using (,).
- In the similar way, we can use the PigStorage() function to store the data into HDFS directory as given below.
grunt> STORE employee INTO ' hdfs://localhost:9000/pig_Output/ ' USING PigStorage (',');
- This will store the data into the given directory. You can verify the data as given below.
Verification
- First of all, list out the files in the directory named pig_output using ls command as given below.
$ hdfs dfs -ls 'hdfs://localhost:9000/pig_Output/'
Found 2 items
rw-r--r- 1 Hadoop supergroup 0 2017-10-05 13:03 hdfs://localhost:9000/pig_Output/_SUCCESS
rw-r--r- 1 Hadoop supergroup 224 2017-10-05 13:03 hdfs://localhost:9000/pig_Output/part-m-00000
- We can perceive that two files were created after executing the Store statement.
- Then, using the cat command, list the contents of the file named part-m-00000 as given below.
$ hdfs dfs -cat 'hdfs://localhost:9000/pig_Output/part-m-00000'
111,Anu,Shankar,9876543210,Chennai
112,Barvathi,Nambiayar,9876543211,Chennai
113,Kajal,Nayak,9876543212,Trivendram
114,Preethi,Antony,9876543213,Pune
115,Raj,Gopal,9876543214,Hyderabad
116,Yashika,Kannan,9876543215,Delhi
117,siddu,Narayanan,9876543216,Kolkata
118,Timple,Mohanthy,9876543217,Bhuwaneshwar