Difference between Hive and HBase ?
Hive
- Hive is a datawarehousing package built on the top of Hadoop. It is mainly used for data analysis. It generally target towards users already comfortable with Structured Query Language (SQL).
- It is similar to SQL and called Hive Query Language (HQL).
- Hive manages and queries structured data. Moreover, hive abstracts complexity of Hadoop. It does not support
- Not a full database.
- Not a real time processing system.
- Not SQL-92 compliant.
- Does not provide row level insert, updates or deletes.
- Doesn’t support transactions and limited sub-query support.
- Query optimization in evolving stage.
Hbase
- HBase is a column-oriented database management system that runs on top of Hadoop Distributed File System (HDFS).
- It is well suited for sparse data sets, which are common in many Big Data use cases.
- It is an opensource, distributed database developed by Apache software foundations.
- Initially, it was named Google Big Table, afterwards it was re-named as HBase and is primarily written in Java.
- It can store massive amount of data from terabytes to petabytes.
- It is built for low-latency operations and is used extensively for read and write operations.
- It stores large amount of data in the form of tables.
HIVE | HBASE |
---|---|
Hive is a query engine. | Data storage particularly for unstructured data. |
Mainly used for batch processing. | Extensively used for transactional processing. |
Not a real time processing. | Real-time processing. |
Only for analytical queries. | Real-time querying. |
Runs on the top of Hadoop. | Runs on the top of HDFS (Hadoop distributed file system). |
Apache Hive is not a database. | It support NoSQL database. |
It has schema model. | It is free from schema model. |
Made for high latency operations. | Made for low level latency operations. |