[Solved-3 Solutions] How to store gzipped files using PigStorage in Apache Pig ?
Problem:
- Apache Pig v0.7 can read gzipped files with no extra effort on part, e.g.:
It is processed that data and output it to disk :
But the output file isn't compressed:
Is there a way of STORE command to output content in gzip format ?
Solution 1:
There are two ways:
Why pigstorage()
- The PigStorage() function loads and stores data as structured text files. It takes a delimiter using which each entity of a tuple is separated as a parameter. By default, it takes â\tâ as a parameter.
1. As mentioned above in the storage we can say the output directory as
Use compression
- Compression can be used to reduce the amount of data to be stored to disk and written over the network. By default, compression is turned off, both between map and reduce tasks and between MapReduce jobs.
2. Set compression method in script.
Solution 2:
Specifying the compression format using the 'STORE' statement
Notice the above statements. Pig supports 3 compression formats, i.e GZip, BZip2 and LZO. For getting LZO to work we have to install it separately.
Solution 3:
Specifying compression via job properties
- By setting the following properties in pig script, i.e, output.compression.enabled and output.compression.codec via the following code