pig tutorial - apache pig tutorial - Apache Pig Grunt Shell - pig latin - apache pig - pig hadoop
What is Grunt Shell in Apache Pig ?
- Grunt Shell is a Shell Command.
- The Grunt shell of Apache Pig is mainly used to write Pig Latin scripts. Prior to that, we can invoke any shell commands using sh and fs.
- There are certain useful shell and utility commands provided and given by the Grunt shell.
- Fs
- Invokes any FsShell command from within a Pig script or the Grunt shell.
- fs -mkdir /tmp
- fs -copyFromLocal file-x file-y
- fs -ls file-y
- Sh
- Invokes any sh shell command from within a Pig script or the Grunt shell.
- ls
- Pwd
- Clear
- Exec
- Help
- History
- Kill
- Exec
- Run a Pig script.
- exec [–param param_name = param_value] [–param_file file_name] [script]
- Use the exec command to run a Pig script with no interaction between the script and the Grunt shell (batch mode).
- Aliases defined in the script are not available to the shell;
- Run
- Run a Pig script
- run [–param param_name = param_value] [–param_file file_name] script
- Interactive mode
Shell Commands
- The Grunt shell of Apache Pig is used to write Pig Latin scripts.
- We can invoke any shell commands by two commands and they are sh and fs.
sh Command
- We can invoke any shell commands which is given from the Grunt shell by using the sh command.
- By the using the sh command from the Grunt shell, we cannot execute the commands which are a part of the shell environment.
Syntax
grunt> sh shell command parameters
Sample Code:
grunt> sh ls
pig
pig_1444799121955.log
pig.cmd
pig.py
fs Command
- We can invoke any FsShell commands from the Grunt shell by using the fs command.
- The fs command extends the set of supported file system commands and the capabilities supported for existing commands
Syntax
grunt> sh File System command parameters
Sample Code:
- grunt> fs -ls
- Found 3 items
- drwxrwxrwx - Hadoop supergroup 0 2015-09-08 14:13 Hbase
- drwxr-xr-x - Hadoop supergroup 0 2015-09-09 14:52 seqgen_data
- drwxr-xr-x - Hadoop supergroup 0 2015-09-08 11:30 twitter_data
Utility Commands
- The Grunt shell provides a set of utility commands which is a type of shell command which is used.
- They include utility commands such as clear, help, history, quit, set, exec, kill, and run to control Pig from the Grunt shell.
Clear Command
- The clear command is a utility command which is used to clear the screen of the Grunt shell.
Syntax
grunt> clear
Help Command
- The help command is a utility command which give us a list of Pig commands and Pig properties.
Usage
- We get a list of Pig commands by using the help command which is given below:
grunt> help
Commands: <pig latin statement>; - See the PigLatin manual for details:
http://hadoop.apache.org/pig
File system commands:fs <fs arguments> - Equivalent to Hadoop dfs command:
http://hadoop.apache.org/common/docs/current/hdfs_shell.html
Diagnostic Commands:describe <alias>[::<alias] - Show the schema for the alias.
Inner aliases can be described as A::B.
explain [-script <pigscript>] [-out <path>] [-brief] [-dot|-xml]
[-param <param_name>=<pCram_value>]
[-param_file <file_name>] [<alias>] -
Show the execution plan to compute the alias or for entire script.
-script - Explain the entire script.
-out - Store the output into directory rather than print to stdout.
-brief - Don't expand nested plans (presenting a smaller graph for overview).
-dot - Generate the output in .dot format. Default is text format.
-xml - Generate the output in .xml format. Default is text format.
-param <param_name - See parameter substitution for details.
-param_file <file_name> - See parameter substitution for details.
alias - Alias to explain.
dump <alias> - Compute the alias and writes the results to stdout.
Utility Commands: exec [-param <param_name>=param_value] [-param_file <file_name>] <script> -
Execute the script with access to grunt environment including aliases.
-param <param_name - See parameter substitution for details.
-param_file <file_name> - See parameter substitution for details.
script - Script to be executed.
run [-param <param_name>=param_value] [-param_file <file_name>] <script> -
Execute the script with access to grunt environment.
-param <param_name - See parameter substitution for details.
-param_file <file_name> - See parameter substitution for details.
script - Script to be executed.
sh <shell command> - Invoke a shell command.
kill <job_id> - Kill the hadoop job specified by the hadoop job id.
set <key> <value> - Provide execution parameters to Pig. Keys and values are case sensitive.
The following keys are supported:
default_parallel - Script-level reduce parallelism. Basic input size heuristics used
by default.
debug - Set debug on or off. Default is off.
job.name - Single-quoted name for jobs. Default is PigLatin:<script name>
job.priority - Priority for jobs. Values: very_low, low, normal, high, very_high.
Default is normal stream.skippath - String that contains the path.
This is used by streaming any hadoop property.
help - Display this message.
history [-n] - Display the list statements in cache.
-n Hide line numbers.
quit - Quit the grunt shell.
History Command
- This command will display a list of statements which are executed and used since the Grunt sell has been invoked.
Usage
- We have executed the three statements since the opening the Grunt shell.
grunt> customers = LOAD 'hdfs://localhost:9000/pig_data/customers.txt' USING PigStorage(',');
grunt> orders = LOAD 'hdfs://localhost:9000/pig_data/orders.txt' USING PigStorage(',');
grunt> student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING PigStorage(',');
- We can produce the following output by using the history command
grunt> history
customers = LOAD 'hdfs://localhost:9000/pig_data/customers.txt' USING PigStorage(',');
orders = LOAD 'hdfs://localhost:9000/pig_data/orders.txt' USING PigStorage(',');
student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING PigStorage(',');
set Command
- The set command which is given is used to show and assign values to the keys which is used in Pig.
Usage
- We can set values to the following keys by using set commands
Key | Description and values |
---|---|
default_parallel | You can set the number of reducers for a map job by passing any whole number as a value to this key. |
debug | You can turn off or turn on the debugging freature in Pig by passing on/off to this key. |
job.name | You can set the Job name to the required job by passing a string value to this key. |
job.priority | You can set the job priority to a job by passing one of the following values to this key −
|
stream.skippath | For streaming, you can set the path from where the data is not to be transferred, by passing the desired path in the form of a string to this key. |
quit Command
- We can quit from the Grunt shell by using the quit command.
Syntax:
grunt> quit
exec Command
- We can execute Pig scripts from the Grunt shell by using the exec command
Syntax
grunt> exec [-param param_name = param_value] [-param_file file_name] [script]
Example
Student.txt
001,Suresh,Hyderabad
002,Panitha,Malaysia
003,Pratyush,Singapore
- Here is the sample script which is given for Exec command and it is given as sample_script.pig
Sample_script.pig
student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING PigStorage(',')
as (id:int,name:chararray,city:chararray)
Dump student;
Syntax:
grunt> exec /sample_script.pig
Output:
(1,Suresh,Hyderabad)
(2,Panitha,Malaysia)
(3,Pratyush,Singapore)
kill Command
- We can kill a MapReduce job from the Grunt shell by using the kill command.
Syntax:
grunt> kill JobId
Example:
grunt> kill Id_0055
run Command
- We can run a Pig script from the Grunt shell by using the run command
Syntax
grunt> run [-param param_name = param_value] [-param_file file_name] script
Example
Student.txt
004,vanitha,Delhi
005,priya,Mumbai
006,supriya,Banglaore
- We can assume that we have a script file which is called sample_script.pig in the local file system which is given with the following content.
Sample_script.pig
student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING
PigStorage(',') as (id:int,name:chararray,city:chararray);
Sample_script.pig Syntax:
grunt> run /sample_script.pig
Output:
grunt> Dump;
(4,vanitha,Delhi)
(5,priya,Mumbai)
(6,supriya,Banglaore)