Difference between hive and pig
Hive | pig |
---|---|
Hive is used by data analysts. | Pig Hadoop Component is generally used by Researchers and Programmers. |
Hive is used in structured Data | Pig Hadoop Component is used for semi structured data. |
Hive Hadoop Component has a declarative SQL language (HiveQL). |
Pig has a procedural data flow language (Pig Latin). |
Hive uses thrift based server that send queries and corner directly to the Hive server which execute them. |
This feature is not available with Pig. |
Hive directly leverages SQL expertise and thus can be learnt easily. |
Pig is also SQL but varies to a great extent and it will take some time efforts to master Pig. |
Hive not support in Avro. | Pig supports in Avro. |
Hive Hadoop Component operates on the server side of any cluster. |
Pig Hadoop Component operates on the client side of any cluster. |
Hive Hadoop Component is mainly used for creating reports. |
Pig Hadoop Component is mainly used for programming. |
Hive helpful for ETL. | Pig is a great (Extract, Transform and Load) tool for big data its powerful transformation and processing capabilities. |
Hive makes use of exact variation of the SQL DLL language by defining the tables beforehand and storing the schema details in any local database. |
In Pig there is no dedicated metadata database and the schemas or data types will be defined in script itself. |
The Hive has a provision for partitions so that can process the subset of data by date or in an alphabetical order. |
Pig Hadoop component does not have any notion for partitions though might be one can achieve this through filters. |
It renders users with sample data for each scenario and each step through its “Illustrate” function. |
This feature is not incorporated with the Hive Hadoop Component. |