Type conversion pig hcatalog ?
What is type conversion
- Type conversion is converting one type data to another type. It is also known as Type Casting.
Set Up
- The HCatLoader and HCatStorer interfaces are used with Pig scripts to read and write data in HCatalog-managed tables. No HCatalog-specific setup is required for these interfaces.
Running Pig
The -useHCatalog Flag
- To bring in the appropriate jars for working with HCatalog, simply include the following flag / parameters when running Pig from the shell, Hue, or other applications:
- pig -useHCatalog
Stale Content Warning
The fully qualified package name changed from
org.apache.hcatalog.pig to org.apache.hive.hcatalog.pig in
Pig versions 0.14+.
In many older web site examples we may find references to the old syntax which no longer functions.
Previous Pig Versions | 0.14+ |
---|---|
org.apache.hcatalog.pig.HCatLoader | org.apache.hive.hcatalog.pig.HCatLoader |
org.apache.hcatalog.pig.HCatStorer | org.apache.hive.hcatalog.pig.HCatStorer |
HCatLoader
- HCatLoader is used with Pig scripts to read data from HCatalog-managed tables.
Usage
- HCatLoader is accessed via a Pig load statement.
- Using Pig 0.14+
Assumptions
- We must specify the table name in single quotes: LOAD 'tablename'. If we are using a non-default database we must specify your input as 'dbname.tablename'. If we are using Pig 0.9.2 or earlier, we must create your database and table prior to running the Pig script.
- Beginning with Pig 0.10 we can issue these create commands in Pig using the SQL command. The Hive metastore lets we create tables without specifying a database; if we created tables this way, then the database name is 'default' and is not required when specifying the table for HCatLoader.
HCatLoader Data Types
- Restrictions apply to the types of columns HCatLoader can read from HCatalog-managed tables. HCatLoader can read onlythe Hive data types listed below.
- Pig will interpret each Hive data type.
Types in Hive 0.12.0 and Earlier
Hive 0.12.0 and earlier releases support reading these Hive primitive data types with HCatLoader:
- boolean
- int
- long
- float
- double
- string
- binary and these complex data types
- map - key type should be string
- ARRAY any type
- struct any type fields
Running Pig with HCatalog
- Pig does not automatically pick up HCatalog jars. To bring in the necessary jars, we can either use a flag in the pig command or set the environment variables PIG_CLASSPATH and PIG_OPTS as described below.
The -useHCatalog Flag
- To bring in the appropriate jars for working with HCatalog, simply include the following flag:
- pig -useHCatalog
Jars and Configuration Files
- For Pig commands that omit -useHCatalog, we need to tell Pig where to find your HCatalog jars and the Hive jars used by the HCatalog client. To do this, we must define the environment variable PIG_CLASSPATH with the appropriate jars.
- HCatalog can tell we the jars it needs. In order to do this it needs to know where Hadoop and Hive are installed. Also, we need to tell Pig the URI for your metastore, in the PIG_OPTS variable.
- In the case where we have installed Hadoop and Hive via tar, we can do this:
Load Examples
This load statement will load all partitions of the specified table.
/* myscript.pig */
If only some partitions of the specified table are needed, include a partition filter statement immediately following the load statement in the data flow.
The filter statement can include conditions on partition as well as non-partition columns.
/* myscript.pig */
Notice that the schema is automatically provided to Pig; there's no need to declare name and age as fields, as if we were loading from a file.
Filter Operators
A filter can contain the operators 'and', 'or', '()', '==', '!=', '<', '>', '<=' and '>='.
A complex filter can have various combinations of operators, such as:
These two examples have the same effect: