pig tutorial - apache pig tutorial - Apache Pig - User Defined Functions - pig latin - apache pig - pig hadoop
What is User Defined Functions in Apache Pig ?
- In addition to the built-in functions, Apache Pig provides extensive support for User Defined Functions (UDF’s).
- Using these UDF’s, you can define your own functions and use them.
![apache pig user defined functions](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/apache-pig-user-defined-functions.gif)
Learn apache pig - apache pig tutorial - apache pig user defined functions - apache pig examples - apache pig programs
Supporting languages:
- The UDF support is provided in six programming languages, namely, Java, Jython, Python, JavaScript, Ruby and Groovy.
- For writing UDF’s, complete support is provided in Java and limited support is provided in all the remaining languages.
- Using Java, we can write UDF’s involving all parts of the processing like data load/store, column transformation, and aggregation.
- Apache Pig has been written in Java, the UDF’s written using Java language work efficiently compared to other languages.
- In Apache Pig, you also have a Java repository for UDF’s named Piggybank. Using Piggybank, you can access Java UDF’s written with other users, and contribute your own UDF’s.
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - pig user defined function](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-user-defined-function.png)
Types of UDF’s in Java
Writing UDF’s using Java, you can create and use the following three types of functions −
- Filter Functions − The filter functions are used as conditions in filter statements. These functions accept a Pig value as input and return a Boolean value.
- Eval Functions − The Eval functions are used in FOREACH-GENERATE statements. These functions accept a Pig value as input and return a Pig result.
- Algebraic Functions − The Algebraic functions act on inner bags in a FOREACHGENERATE statement. These functions are used to perform full MapReduce operations on an inner bag.
Writing UDF’s using Java:
- To write a UDF using Java, we have to integrate the jar file Pig-0.15.0.jar. In this section, we discuss how to write a sample UDF using Eclipse. Before proceeding further, make sure you have installed Eclipse and Maven in your system.
Follow the steps given below to write a UDF function,
Step 1
- Open Eclipse and create a new project (say myproject).
Step 2
- Convert the newly created project into a Maven project.
Step 3
- Copy the following content in the pom.xml.
- This file contains the Maven dependencies for Apache Pig and Hadoop-core jar files.
<project xmlns = "http://maven.apache.org/POM/4.0.0"
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation = "http://maven.apache.org/POM/4.0.0http://maven.apache .org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>Pig_Udf</groupId>
<artifactId>Pig_Udf</artifactId>
<version>0.0.1-SNAPSHOT</version>
<build>
<sourceDirectory>src</sourceDirectory>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.3</version>
<configuration>
<source>1.7</source>
<target>1.7</target>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>org.apache.pig</groupId>
<artifactId>pig</artifactId>
<version>0.15.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>0.20.2</version>
</dependency>
</dependencies>
</project>
Step 4
- Save the file and refresh it. In the Maven Dependencies section, we can find the downloaded jar files.
Step 5
- Create a new class file with name Sample_Eval and copy the following content in it.
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
public class Sample_Eval extends EvalFunc<String>{
public String exec(Tuple input) throws IOException {
if (input == null || input.size() == 0)
return null;
String str = (String)input.get(0);
return str.toUpperCase();
}
}
While Writing UDF’s, it is set to inherit the EvalFunc class and provide operation to exec() function. With in this function, the code required for the UDF is written.
- The above example, we have return the code to convert the contents of the specified column to uppercase.
- After compiling the class without errors, right-click on the Sample_Eval.java file. It gives you a menu. Select export as shown in the following screenshot.
![apache pig user defined-functions2](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/apache-pig-user-defined-functions2.png)
Learn apache pig - apache pig tutorial - apache pig user defined-functions2 - apache pig examples - apache pig programs
- On click export, you will get the following window. Click on JAR file.
![apache pig user defined-functions3](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/apache-pig-user-defined-functions3.png)
Learn apache pig - apache pig tutorial - apache pig user defined-functions3 - apache pig examples - apache pig programs
- Proceed further by clicking Next> button. You will get another window where you need to enter the path in the local file system, where you need to store the jar file.
![apache pig user defined-functions4](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/apache-pig-user-defined-functions4.png)
Learn apache pig - apache pig tutorial - apache pig user defined-functions4 - apache pig examples - apache pig programs
- Finally click the Finish button. In the specified folder, a Jar file sample_udf.jar is created. This jar file contains the UDF written in Java.
Using the UDF:
- Once writing the UDF and generating the Jar file, follow the steps given below
Step 1:
Registering the Jar file
- After writing UDF (in Java) you have to register the Jar file that contain the UDF using the Register operator.
- By registering the Jar file, users can intimate the location of the UDF to Apache Pig.
Syntax:
The Register operator syntax is given below.
REGISTER path;
Example:
- As an example let us register the sample_udf.jar created previously in this chapter.
- Start Apache Pig in local mode and register the jar file sample_udf.jar as given below.
$cd PIG_HOME/bin
$./pig -x local
REGISTER '/$PIG_HOME/sample_udf.jar'
Note
− imagine the Jar file in the path − /$PIG_HOME/sample_udf.jar
Step 2:
Defining Alias
- After registering the UDF you can define an alias to it using the Define operator.
Syntax:
The syntax of the Define operator is shown below.
DEFINE alias {function | [`command` [input] [output] [ship] [cache] [stderr] ] };
Example:
Define the alias for sample_eval as shown below.
DEFINE sample_eval sample_eval();
Step 3:
Using the UDF
- Once defining the alias you can use the UDF same as the built-in functions. Assume there is a file named wikitechy_emp_data in the HDFS /Pig_Data/ directory with the following content.
11,Kevin,22,newyork
12,BOB,23,Kolkata
13,Oviya,23,Tokyo
14,Jack,25,London
15,David,23,Bhuwaneshwar
16,Maggy,22,Chennai
17,Anto,22,newyork
18,Syam,23,Kolkata
19,Mary,25,Tokyo
20,Saran,25,London
21,Stacy,25,Bhuwaneshwar
22,Kelly,22,Chennai
Ensure you have loaded this file into Pig as given below.
grunt> wikitechy_emp_data = LOAD 'hdfs://localhost:9000/pig_data/emp1.txt' USING PigStorage(',')
as (id:int, name:chararray, age:int, city:chararray);
- we convert the names of the employees in to upper case using the UDF sample_eval.
grunt> Upper_case = FOREACH wikitechy_emp_data GENERATE sample_eval(name);
Verification:
- we are verify the contents of the relative Upper_case as given below.
grunt> Dump Upper_case;
(KEVIN)
(BOB)
(OVIYA)
(JACK)
(DAVID)
(MAGGY)
(ANTO)
(SYAM)
(MARY)
(SARAN)
(STACY)
(KELLY)
More functions: Datafu Pig
- Stats: variance, quantiles, median, etc.
- Bags: concat, append, preped, etc.
- Sampling
- Page rank
- Session estimation
How to use UDF libraries
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - pig udf function](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-udf-functions.png)
pig scripting
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - pig scripting](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/pig-scripting.png)
Calling a script
![learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example - pig calling scripting](https://wikitechy.com/tutorials/apache-pig/img/apache-pig-images/calling-pig-script.png)