pig tutorial - apache pig tutorial - Apache Pig - Execution - pig latin - apache pig - pig hadoop
How to execute in Apache Pig ?
- We can run Apache Pig in two modes and they are
- Local Mode
- HDFS mode.
Local Mode
- In this local mode, all the files are installed and run from our local host and our local file system.
- This local mode is generally used for testing purpose.
- To run Pig command in local mode, we need access to a single machine where all files are installed and run using our local host and file system.
- We need to specify local mode by using the -x flag (pig -x local).
- In local mode, the pig runs on single JVM and accesses our local file system.
- The local mode is best suitable for dealing with the smaller data sets.
- By providing the command -x local, we can get in to Pig local mode of execution.
- In the local mode, Pig always looks for the local file system path where the data is loaded.
- The command $pig -x local implies that the execution mode is in local mode.
Example:
MapReduce Mode
- MapReduce mode is used when we load or process the data which exists in the Hadoop File System (HDFS) which is done by using Apache Pig.
- In this MapReduce mode, whenever we execute the Pig Latin statements to process the data, which is invoked in the back-end to perform a particular operation on the data which exists in the HDFS.
- To run Pig in MapReduce mode, we need access to a Hadoop cluster and the HDFS installation.
- MapReduce mode is the default mode when compared to local mode which is specified using the -x flag (pig -x mapreduce).
- In this MapReduce mode, we are having proper Hadoop cluster setup and Hadoop installations given.
- The pig runs on MR mode which is default mode for Pig.
- Pig translates the submitted queries into Map reduce jobs and runs them on top of Hadoop cluster.
- Pig Latin statements like LOAD, STORE are used to read data from the HDFS file system and to generate output in MapReduce mode.
Example:
Apache Pig Execution Mechanisms
- Apache Pig scripts can be executed and run in three modes and they are:
- interactive mode
- batch mode
- embedded mode
Interactive Mode
- We run Apache Pig in interactive mode which is done by using the Grunt shell.
- In this interactive mode, we can enter the Pig Latin statements and get the output by using Dump operator.
Example:
Batch Mode
- We can run Apache Pig in Batch mode by writing the command the Pig Latin script in a single file with .pig extension.
Example:
Invoking the Grunt Shell
- We can invoke the Grunt shell in a desired mode (local/MapReduce) by using the −x option as which is given below in table format.
Local mode | MapReduce mode |
---|---|
Command − $ ./pig –x local |
Command − $ ./pig -x mapreduce |
Output − |
Output − |
- Local mode and MapReduce commands will give you the Grunt shell prompt as shown below
- We can exit the Grunt shell using the command ‘ctrl + d’.
- After invoking the Grunt shell, we can execute a Pig script by entering the Pig Latin statements in it.
Executing Apache Pig in Batch Mode
- We can write an entire Pig Latin script in a file and execute it using the -x command.
Sample-script.pig
- Execute the script in the above file which is given below:
Local mode | MapReduce mode |
---|---|
$ pig -x local Sample-script.pig | $ pig -x mapreduce Sample-script.pig |