pig tutorial - apache pig tutorial - Apache Pig - Handling Compression - pig latin - apache pig - pig hadoop
How to Handling Compression in Apache Pig ?
- PigStorage and TextLoader support gzip and bzip compression for both read (load) and write (store).
- BinStorage does not support compression.
- To work with gzip compressed files, input/output files need to have a .gz extension.
- Gzipped files cannot be split across multiple maps; this means that the number of maps created is equal to the number of part files in the input location.
Example
- Ensure that we have a file named wikitechy_emp.txt.zip in the HDFS directory /pigdata/.
- Next, we can load the compressed file into pig as given below.
- In the similar way, you can store the compressed files into pig as given below.