pig tutorial - apache pig tutorial - Apache Pig - Handling Compression - pig latin - apache pig - pig hadoop



How to Handling Compression in Apache Pig ?

  • PigStorage and TextLoader support gzip and bzip compression for both read (load) and write (store).
  • BinStorage does not support compression.
  • To work with gzip compressed files, input/output files need to have a .gz extension.
  • Gzipped files cannot be split across multiple maps; this means that the number of maps created is equal to the number of part files in the input location.

Example

  • Ensure that we have a file named wikitechy_emp.txt.zip in the HDFS directory /pigdata/.
  • Next, we can load the compressed file into pig as given below.
Using PigStorage: 
 
grunt> data = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_emp.txt.zip' USING PigStorage(','); 
 
Using TextLoader:
  
grunt> data = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_emp.txt.zip' USING TextLoader;
  • In the similar way, you can store the compressed files into pig as given below.
Using PigStorage:
grunt> store data INTO 'hdfs://localhost:9000/pig_Output/data.bz' USING PigStorage(' ,');

Related Searches to Apache Pig - Handling Compression