[Solved-2 Solutions] Pig: Hadoop jobs Fail ?
What is hadoop ?
- Hadoop is an open source, Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment
Problem:
we have a pig script that queries data from a csv file.
The script has been tested locally with small and large .csv files.
In Small Cluster: It starts with processing the scripts, and fails after completing 40% of the call
The error is,
Failed to read data from "path to file"
Solution 1:
- An answer for the General Problem would be changing the errors levels in the Configuration Files, adding these two lines to mapred-site.xml
It as a kind of an OutOfMemory Exception
Solution 2:
- Its needed to check logs to increase the verbosity level if needed
To change the memory in Hadoop change the hadoop-env.sh file
For Apache PIG we have this in the header of pig bash file:
So we can use export