pig tutorial - apache pig tutorial - Pig latin - pig latin - apache pig - pig hadoop

Pig operations operate on relations
A relation is a bag
A bag is a collection of tuples
A tuple is an ordered set of fields
A field is any type of data

Basic data types:

Boolean: True, False
Int and Long: 1, 2, 3, 4, 5
Float and Double: 2.3, 1.4, 4.5
Chararray: ‘Hello’, ‘I am a string’
DateTime: 2014-09-11T12:20:14.1234+00:00
… more but you won’t probably use them very often

learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - pig latin data model

Tuple: A catch-all data type

Bag:

Working with Data

Loading data?

Data source: Local or HDFS (usually!)

LOAD instruction:

Data is automatically loaded in a distributed relation

Checking relations’ content

DUMP instruction:

Prints the content of a relation at standard output

DESCRIBE instruction:

Prints the schema of the relation at standard output

ILLUSTRATE instruction:

Prints the schema of the relation and a tuple example at standard output

Operating on relations

FOREACH instruction:

Generate new relations by projecting data of a relation

FOREACH instruction:

Let us execute the instruction and… it seems that nothing happens!
We had some tracing output with LOAD, DUMP, and ILLUSTRATE…

Operating on relations

Pig employs lazy evaluation

Computation only when:

LOAD, ILLUSTRATE, DUMP, STORE

Pig keeps a DAG on MR jobs needed to compute relations (optimized!)

Operating on relations

FILTER instruction:

Generate a new relation by filtering data on a relation

SPLIT instruction:

Splits a relation into multiple relations based on conditions

GROUP instruction:

Creates tuples with the key and a of bag tuples with the same key values

We can use multiple relations. Creates one bag per relation

Nested FOREACH:

Operate on data in bags inside a relation and then project

(inner) JOIN instruction:

Our classic database operator for relations!

(left) JOIN instruction:

Our classic database operator for relations!

CROSS instruction:

Cartesian product of two or more relations

UNION instruction:

Joins in the same relation multiple relations

DISTINCT instruction:

Only preserves unique tuples

ORDER BY instruction:

Sorts relations by a specific criteria

LIMIT instruction:

Truncates relation’s size

RANK instruction:

Appends position of each tuple in the relation

We can also sort and rank!

SAMPLE instruction:

Sample the relation!

CUBE instruction:

Is this really useful? Yes! Many aggregates with just one operation

CUBE/ROLLUP instruction:

Like standard CUBE but nulls values are introduced from right to left

ASSERT instruction:

Assert that the whole relation fulfills a condition
Useful for debugging

STORE instruction:

Stores the relation into the local FS or HDFS (usually!)
Useful for debugging

Where to find useful PigLatin scripts?

PiggyBank - Pig’s repository of usercontributed functions
- load/store functions (e.g. from XML)
- datetime, text functions math, stats functions
DataFu - LinkedIn's collection of Pig UDFs
- statistics functions (quantiles, variance etc.)
- convenient bag functions (intersection, union etc.)
- utility functions (assertions, random numbers, MD5, distance between lat/long pair), PageRank

How to develop PigLatin scripts?

Eclipse plugins

PigEditor

syntax/errors highlighting
check of alias name existence
auto completion of keywords, UDF names

PigPen

graphical visualization of scripts (box and arrows)

Pig-Eclipse
Plugins for Vim, Emacs, TextMate

Usually provide syntax highlighting and code completion

How to run PigLatin scripts?

PigServer Java class, a JDBC like interface

Python and JavaScript with PigLatin code embedded

adds control flow constructs such as if and for
avoids the need to invent a new language
uses a JDBC-like compile, bind, run model

Related Searches to Apache Pig Overview

pig commandspig script tutorialpig scriptpig programmingprogramming pigpig apachepig mapreducepig architecturepig documentationpig examplespig join examplepig latin programhadoop pig commandshadoop pig examplesforeach generate pigstore command in pigpig tutorial apache pig tutorial hadoop pig tutorial pig latin tutorial learn pig pig hadoop pig tutorial point learn pig latin pig big data pig latin hadoop apache pig pig latin pig commands pig hive pig interview questions hadoop pig hive pig script how to learn pig latin pig and hive pig language pig tutorial pdf apache pig tutorial pdf hadoop pig examples pig store pig programming apache pig download pig data pig script example pig group pig storage pig in latin pig order what is apache pig how to read pig latin pig flatten pigstorage flatten in pig pig latin examples pig mapreduce apache pig commands pig commands pdf pig examples pig load pig code guide pig pig jobs store command in pig tutorial peppa pig peppa pig tutorial simple pig how to write in pig latin datapig pig latin program uses of pig

pig tutorial - apache pig tutorial - Pig latin - pig latin - apache pig - pig hadoop

What is pig latin - Pig Programming Model: Data

Basic data types:

Tuple: A catch-all data type

Bag:

Working with Data

Loading data?

Checking relations’ content

Operating on relations

Operating on relations

Operating on relations

Where to find useful PigLatin scripts?

How to develop PigLatin scripts?

How to run PigLatin scripts?

Related Searches to Apache Pig Overview

Wikitechy

Workshop

Join our Community

Other Languages

pig tutorial - apache pig tutorial - Pig latin - pig latin - apache pig - pig hadoop

What is pig latin - Pig Programming Model: Data

Basic data types:

Tuple: A catch-all data type

Bag:

Working with Data

Loading data?

Checking relations’ content

Operating on relations

Operating on relations

Operating on relations

Where to find useful PigLatin scripts?

How to develop PigLatin scripts?

How to run PigLatin scripts?

Related Searches to Apache Pig Overview

Summer Offline Internship

Summer Online Internship

Internship in Chennai

Programming / Technology Internship in Chennai

Wikitechy

Workshop

Join our Community

Other Languages