pig tutorial - apache pig tutorial - Pig latin - pig latin - apache pig - pig hadoop
What is pig latin - Pig Programming Model: Data
- Pig operations operate on relations
- A relation is a bag
- A bag is a collection of tuples
- A tuple is an ordered set of fields
- A field is any type of data
Basic data types:
- Boolean: True, False
- Int and Long: 1, 2, 3, 4, 5
- Float and Double: 2.3, 1.4, 4.5
- Chararray: ‘Hello’, ‘I am a string’
- DateTime: 2014-09-11T12:20:14.1234+00:00
- … more but you won’t probably use them very often
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - pig latin data model](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-latin-data-model.png)
Tuple: A catch-all data type
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - pig data type](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-data-type.png)
Bag:
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - pig data type bag](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-data-type-bag.png)
Working with Data
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - pig data methods](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-data-methods.png)
Loading data?
- Data is automatically loaded in a distributed relation
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - pig load data](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-load-data.png)
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - pig load data](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/apache-pig-load-data.png)
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - pig latin data type tuple map](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-data-type-tuple-map.png)
Checking relations’ content
- Prints the content of a relation at standard output
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example -apache pig dump statement](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/apache-pig-dump-statement.png)
- Prints the schema of the relation at standard output
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example -apache pig describe statement](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/apache-pig-describe-statement.png)
- Prints the schema of the relation and a tuple example at standard output
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example -apache pig illustrate statement](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/apache-pig-illustrate-statement.png)
Operating on relations
- Generate new relations by projecting data of a relation
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example -apache pig foreach statement](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/apache-pig-foreach-statement.png)
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example -apache pig foreach statement](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-for-each-statement.png)
- Let us execute the instruction and… it seems that nothing happens!
- We had some tracing output with LOAD, DUMP, and ILLUSTRATE…
Operating on relations
- LOAD, ILLUSTRATE, DUMP, STORE
Operating on relations
- Generate a new relation by filtering data on a relation
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example -apache pig filter operation](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-filter-operation.png)
- Splits a relation into multiple relations based on conditions
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example -apache pig split operation](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-split-operation.png)
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example -apache pig split operation](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/apache-pig-split-operation.png)
- Creates tuples with the key and a of bag tuples with the same key values
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - apache pig group by operation](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/apache-pig-group-by-operation.png)
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - apache pig group by operation](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/apache-pig-group-operation.png)
- Operate on data in bags inside a relation and then project
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example -apache pig nested foreach operation](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-nested-foreach.png)
- Our classic database operator for relations!
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example -apache pig inner join operation](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-inner-join-operation.png)
- Our classic database operator for relations!
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - apache pig left outer join operation](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-left-outer-join-operation.png)
- Cartesian product of two or more relations
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - apache pig cross join operation](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-cross-join-operation.png)
- Joins in the same relation multiple relations
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - apache pig union operation](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-union-operation.png)
- Only preserves unique tuples
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - apache pig distinct operation](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-distinct-operation.png)
- Sorts relations by a specific criteria
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - apache pig order by operation](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-order-by-operation.png)
- Truncates relation’s size
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - apache pig limit operation](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-limit-operation.png)
- Appends position of each tuple in the relation
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - apache pig rank operation](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-rank-operation.png)
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - apache pig sort rank operation](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-rank-sort-operation.png)
- Sample the relation!
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - apache pig sample instruction](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-sample-instruction.png)
- Is this really useful? Yes! Many aggregates with just one operation
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - apache pig cube operation](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-cube-operation.png)
- Like standard CUBE but nulls values are introduced from right to left
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - apache pig cube rollup operation](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-cube-rollup-operation.png)
- Assert that the whole relation fulfills a condition
- Useful for debugging
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - apache pig assert operation](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-assert-operation.png)
- Stores the relation into the local FS or HDFS (usually!)
- Useful for debugging
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - apache pig store operation](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-store-operation.png)
Where to find useful PigLatin scripts?
- PiggyBank - Pig’s repository of usercontributed
functions
- load/store functions (e.g. from XML)
- datetime, text functions math, stats functions
- DataFu - LinkedIn's collection of Pig UDFs
- statistics functions (quantiles, variance etc.)
- convenient bag functions (intersection, union etc.)
- utility functions (assertions, random numbers, MD5, distance between lat/long pair), PageRank
How to develop PigLatin scripts?
- PigEditor
- syntax/errors highlighting
- check of alias name existence
- auto completion of keywords, UDF names
- PigPen
- graphical visualization of scripts (box and arrows)
- Pig-Eclipse
- Plugins for Vim, Emacs, TextMate
- Usually provide syntax highlighting and code completion
How to run PigLatin scripts?
- adds control flow constructs such as if and for
- avoids the need to invent a new language
- uses a JDBC-like compile, bind, run model