[Solved-1 Solution] How to Get Pig to Work with lzo Files ?

Problem:

How to Get Pig to Work with lzo Files ?

Solution 1:

You can use this:

1. Clone hadoop-lzo from github
2. Compile it to get a hadoop-lzo*.jar and the native *.o libraries. You'll need to compile this on a 64bit machine.
3.Copy the native libs to

$HADOOP_HOME/lib/native/Mac_OS_X-x86_64-64/.

4.Copy the java jar to

	$HADOOP_HOME/lib and $PIG_HOME/lib

5. Then configure hadoop and pig to have the property java.library.path point to the lzo native libraries.

6.We can do this in

$HADOOP_HOME/conf/mapred-site.xml

with:

<property>
    <name>mapred.child.env</name>
    <value>JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/Mac_OS_X-x86_64-64/</value>
</property>

7. Now try grunt shell by running pig again, and make sure everything still works.
8. All we need to do now is install elephant-bird.
9. command: ant in the elephant-bird folder in order to create a jar.
10. For simplicity's sake, move all relevant jars (hadoop-lzo-x.x.x.jar and elephant-bird-x.x.x.jar)
11. Play around with loading normal files and lzos in grunt shell. Register the relevant jars mentioned above, try loading a file, limiting output to a manageable number, and dumping it. This should all work fine whether you're using a normal text file or an lzo.

Apache Pig Basics

Apache Pig - Filtering

Apache Pig - Operators

Apache Pig - Functions

Eval Functions

Bag-Tuple Functions

DateTime Function

User Defined Function

Load-store Function

Math-function

Apache Pig- Regex

Apache Pig - Running Scripts