Internal architecture of Apache Pig
- Pig Latin consist of pig to analyze the data from Hadoop.
- Its a highlevel data processing language it perform various operations like data types and operators.
- To perform a particular task Pig Script is used and execution mechanisms like(Grunt Shell, UDFs, Embedded).
- To produce the desired output continuous transformations applied in Pig Framework.
- Programmer’s job become easier when converting these scripts into a series of MapReduce jobs.
Apache Pig Components
- There are various components in the Apache Pig framework.
Parser
- The Pig Scripts are handled by the Parser.It checks the syntax,type checking, and other miscellaneous checks.The output will be like a DAG (directed acyclic graph), which represents the Pig Latin statements and logical operators.
- In the DAG, the logical operators of the script are represented as nodes and therefore the data flows are represented as edges.
Optimizer
- The logical plan DAG is passed to the logical optimizer,it carries out the logical optimizations,projection and pushdown.
Compiler
- It compiles the optimized logical plan into a series of MapReduce jobs.
Execution engine
- Producing the desired results MapReduce jobs are executed on Hadoop.
Pig Latin Data Model
- It allows datatypes such as map and tuple and it representation of Pig Latin’s data model.
Atom
- Any single value in Pig Latin, irrespective of their data, type is known as an Atom.
- Values of Pig in Atom are number,int,long,float,double,chararray, and bytearray.
- Field define as a piece of data or a simple atomic value. Example − ‘xxx’ or ‘30’
Tuple
- A record formed by ordered set of fields is known as a tuple. A tuple is similar to a row in a table of RDBMS. Example − (xxx, 30)
Bag
- Is an unordered set of tuples.
- A collection of tuples (non-unique) is known as a bag.Tuple have any number of fields (flexible schema).
- No need that every tuple contain the same number of fields and the same position (column) have the same type.
- Example − {(xxx, 30), (yyyy, 45)}
- A Field contains bags in context its called inner bags.
- Example − {xxx, 30, {98xxxxx22, [email protected],}}
Map
- Map is a set of key-value pairs the key define as chararray and its unique.
- It is represented by ‘[]’ Example − [name#xxx, age#30]
Relation
- A relation is a bag of tuples.Pig Latin relations are unordered (there is no guarantee that tuples are processed in any particular order).