[Solved-1 Solution] Storing data to SequenceFile from Apache Pig ?
Sequence File :
- SequenceFile is a flat file consisting of binary key/value pairs. It is extensively used in MapReduce as input/output formats. It is also worth noting that, internally, the temporary outputs of maps are stored using SequenceFile
Problem:
Apache Pig can load data from Hadoop sequence files using the PiggyBank SequenceFileLoader
Is there also a library out there that would allow writing to Hadoop sequence files from Pig ?
Solution 1:
- This is possible now, although it will become a fair bit easier once Pig 0.7 comes out, as it includes a complete redesign of the Load/Store interfaces.
- The "Hadoop expansion pack" Twitter open-sourced at
github
, includes code for generating Load and Store funcs based on Google Protocol Buffers (building on Input/Output formats for same - we already have those for sequence files, obviously).