Difference between Pig, Hive and HBase
Pig | Hive | Hbase |
---|---|---|
It is used for semi structured data. | Hive is query engine | HBase is a data storage particularly for unstructured data. |
Pig Hadoop Component is generally used by Researchers and Programmers. |
Apache Hive is mainly used for batch processing i.e. OLAP and creating reports. |
HBase is extensively used for transactional processing where in the response time of the query is not highly interactive i.e. OLTP |
Pig Hadoop Component operates on the client side of any cluster. |
Hive Hadoop Component operates on the server side of any cluster. |
Operations in HBase are run in real-time on the database |
Avro supported for Pig. | Hive does not support Avro. | The client which is reading/writing the data has to deal with the avro schemas, after HBase delivered the raw data to it. |
Pig Hadoop is a great ETL tool for big data because of its powerful transformation and processing capabilities. |
Hive Hadoop Component is helpful for ETL. |
Hbase Component is helpful for ETL. |
Pig are high-level languages that compile to MapReduce. |
Hive is also a high-level languages that compile to MapReduce. |
HBase allows Hadoop to support the transactions on key value pairs. |
Pig is also SQL-like but varies to a great extent and thus it will take some time efforts to master Pig. |
Hive directly leverages SQL expertise and thus can be learnt easily. |
HBase allows you to do quick random versus scan all of data sequentially, do insert/update/delete from middle, and not just add/append. |