[Solved-2 Solutions] Apache Pig permissions issue ?
Problem:
- If your attempting to get Apache Pig up and running on my Hadoop cluster, and am encountering a permissions problem. Pig itself is launching and connecting to the cluster just fine- from within the Pig shell, if it
ls
through and around HDFS directories. - However, when you try and actually load data and run Pig commands, you may run into permissions-related errors:
grunt> A = load 'all_annotated.txt' USING PigStorage() AS (id:long, text:chararray, lang:chararray);
grunt> DUMP A;
2011-08-24 18:11:40,961 [main] ERROR org.apache.pig.tools.grunt.Grunt - You don't have permission to perform the operation. Error from the server: org.apache.hadoop.security.AccessControlException: Permission denied: user=steven, access=WRITE, inode="":hadoop:supergroup:r-xr-xr-x
2011-08-24 18:11:40,977 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias A
Details at logfile: /Users/steven/Desktop/Hacking/hadoop/pig/pig-0.9.0/pig_1314230681326.log
grunt>
- In this case,
all_annotated.txt
is a file in my HDFS home directory that you may created, and most definitely have permissions to; the same problem occurs no matter what file you try toload
. However the problem, as the error itself indicates Pig is trying to write somewhere.
If you have any ideas as to what might be going on ?
Solution 1:
Probably our pig.temp.dir setting. It defaults to /tmp on hdfs. Pig will write temporary result there. If we don't have permission to /tmp, Pig will complain. We can try to override it by -Dpig.temp.dir.
Solution 2:
A problem might be that hadoop.tmp.dir is a directory on your local filesystem, not HDFS. Try setting that property to a local directory .we can run into the same error using regular MapReduce in Hadoop.