Type conversion pig hcatalog ?

Type conversion is converting one type data to another type. It is also known as Type Casting.

Set Up

The HCatLoader and HCatStorer interfaces are used with Pig scripts to read and write data in HCatalog-managed tables. No HCatalog-specific setup is required for these interfaces.

Running Pig

The -useHCatalog Flag

To bring in the appropriate jars for working with HCatalog, simply include the following flag / parameters when running Pig from the shell, Hue, or other applications:
pig -useHCatalog

Stale Content Warning

The fully qualified package name changed from

org.apache.hcatalog.pig to org.apache.hive.hcatalog.pig in Pig versions 0.14+.
In many older web site examples we may find references to the old syntax which no longer functions.

Previous Pig Versions	0.14+
org.apache.hcatalog.pig.HCatLoader	org.apache.hive.hcatalog.pig.HCatLoader
org.apache.hcatalog.pig.HCatStorer	org.apache.hive.hcatalog.pig.HCatStorer

HCatLoader

HCatLoader is used with Pig scripts to read data from HCatalog-managed tables.

Usage

HCatLoader is accessed via a Pig load statement.
Using Pig 0.14+

A = LOAD 'tablename' USING org.apache.hive.hcatalog.pig.HCatLoader();

Assumptions

We must specify the table name in single quotes: LOAD 'tablename'. If we are using a non-default database we must specify your input as 'dbname.tablename'. If we are using Pig 0.9.2 or earlier, we must create your database and table prior to running the Pig script.
Beginning with Pig 0.10 we can issue these create commands in Pig using the SQL command. The Hive metastore lets we create tables without specifying a database; if we created tables this way, then the database name is 'default' and is not required when specifying the table for HCatLoader.

HCatLoader Data Types

Restrictions apply to the types of columns HCatLoader can read from HCatalog-managed tables. HCatLoader can read onlythe Hive data types listed below.
Pig will interpret each Hive data type.

Types in Hive 0.12.0 and Earlier

Hive 0.12.0 and earlier releases support reading these Hive primitive data types with HCatLoader:

boolean
int
long
float
double
string
binary and these complex data types
map - key type should be string
ARRAY any type
struct any type fields

Running Pig with HCatalog

Pig does not automatically pick up HCatalog jars. To bring in the necessary jars, we can either use a flag in the pig command or set the environment variables PIG_CLASSPATH and PIG_OPTS as described below.

The -useHCatalog Flag

To bring in the appropriate jars for working with HCatalog, simply include the following flag:
pig -useHCatalog

Jars and Configuration Files

For Pig commands that omit -useHCatalog, we need to tell Pig where to find your HCatalog jars and the Hive jars used by the HCatalog client. To do this, we must define the environment variable PIG_CLASSPATH with the appropriate jars.
HCatalog can tell we the jars it needs. In order to do this it needs to know where Hadoop and Hive are installed. Also, we need to tell Pig the URI for your metastore, in the PIG_OPTS variable.
In the case where we have installed Hadoop and Hive via tar, we can do this:

export HADOOP_HOME=<path_to_hadoop_install>

export HIVE_HOME=<path_to_hive_install>

export HCAT_HOME=<path_to_hcat_install>

export PIG_CLASSPATH=$HCAT_HOME/share/hcatalog/hcatalog-core*.jar:\
$HCAT_HOME/share/hcatalog/hcatalog-pig-adapter*.jar:\
$HIVE_HOME/lib/hive-metastore-*.jar:$HIVE_HOME/lib/libthrift-*.jar:\
$HIVE_HOME/lib/hive-exec-*.jar:$HIVE_HOME/lib/libfb303-*.jar:\
$HIVE_HOME/lib/jdo2-api-*-ec.jar:$HIVE_HOME/conf:$HADOOP_HOME/conf:\
$HIVE_HOME/lib/slf4j-api-*.jar

export PIG_OPTS=-Dhive.metastore.uris=thrift://<hostname>:<port>
Or we can pass the jars in your command line:
<path_to_pig_install>/bin/pig -Dpig.additional.jars=\
$HCAT_HOME/share/hcatalog/hcatalog-core*.jar:\
$HCAT_HOME/share/hcatalog/hcatalog-pig-adapter*.jar:\
$HIVE_HOME/lib/hive-metastore-*.jar:$HIVE_HOME/lib/libthrift-*.jar:\
$HIVE_HOME/lib/hive-exec-*.jar:$HIVE_HOME/lib/libfb303-*.jar:\
$HIVE_HOME/lib/jdo2-api-*-ec.jar:$HIVE_HOME/lib/slf4j-api-*.jar  <script.pig>
The version number found in each filepath will be substituted for *. For example, HCatalog release 0.5.0 uses these jars and conf files:
·	$HCAT_HOME/share/hcatalog/hcatalog-core-0.5.0.jar
·	$HCAT_HOME/share/hcatalog/hcatalog-pig-adapter-0.5.0.jar
·	$HIVE_HOME/lib/hive-metastore-0.10.0.jar
·	$HIVE_HOME/lib/libthrift-0.7.0.jar
·	$HIVE_HOME/lib/hive-exec-0.10.0.jar
·	$HIVE_HOME/lib/libfb303-0.7.0.jar
·	$HIVE_HOME/lib/jdo2-api-2.3-ec.jar
·	$HIVE_HOME/conf
·	$HADOOP_HOME/conf
·	$HIVE_HOME/lib/slf4j-api-1.6.1.jar

Load Examples

This load statement will load all partitions of the specified table.
/* myscript.pig */

A = LOAD 'tablename' USING org.apache.hive.hcatalog.pig.HCatLoader();

If only some partitions of the specified table are needed, include a partition filter statement immediately following the load statement in the data flow.
The filter statement can include conditions on partition as well as non-partition columns.
/* myscript.pig */

A = LOAD 'tablename' USING  org.apache.hive.hcatalog.pig.HCatLoader();

-- date is a partition column; age is not
B = filter A by date == '20100819' and age < 30;

-- both date and country are partition columns
C = filter A by date == '20100819' and country == 'US';
...
...


To scan a whole table, for example:
a = load 'student_data' using org.apache.hive.hcatalog.pig.HCatLoader();
b = foreach a generate name, age;

Notice that the schema is automatically provided to Pig; there's no need to declare name and age as fields, as if we were loading from a file.

Filter Operators

A filter can contain the operators 'and', 'or', '()', '==', '!=', '<', '>', '<=' and '>='.

a = load 'web_logs' using org.apache.hive.hcatalog.pig.HCatLoader();
b = filter a by datestamp > '20110924';

A complex filter can have various combinations of operators, such as:

a = load 'web_logs' using org.apache.hive.hcatalog.pig.HCatLoader();
b = filter a by datestamp == '20110924' or datestamp == '20110925';

These two examples have the same effect:

a = load 'web_logs' using org.apache.hive.hcatalog.pig.HCatLoader();
b = filter a by datestamp == '20110924' or datestamp == '20110925';

Type conversion pig hcatalog ?

What is type conversion

Set Up

Running Pig

Stale Content Warning

HCatLoader

Usage

Assumptions

HCatLoader Data Types

Types in Hive 0.12.0 and Earlier

Running Pig with HCatalog

The -useHCatalog Flag

Jars and Configuration Files

Load Examples

Filter Operators

Related Searches to Type conversion pig hcatalog

Wikitechy

Workshop

Join our Community

Other Languages

Type conversion pig hcatalog ?

What is type conversion

Set Up

Running Pig

Stale Content Warning

HCatLoader

Usage

Assumptions

HCatLoader Data Types

Types in Hive 0.12.0 and Earlier

Running Pig with HCatalog

The -useHCatalog Flag

Jars and Configuration Files

Load Examples

Filter Operators

Related Searches to Type conversion pig hcatalog

Summer Offline Internship

Summer Online Internship

Internship in Chennai

Programming / Technology Internship in Chennai

Wikitechy

Workshop

Join our Community

Other Languages