Apache Hadoop:
- It is a Big Data framework which uses Hadoop Distributed File System to Store the data and MapReduce framework to process that data. Java is used as native language to write MapReduce programs.
- Apache Hbase and Cassandra are both NoSQL databases.
- It does not maintain the strict ACID transactions and require better data modelling to be used effectual.
- Their presentation can be calculated by benchmarking the database using a tool called YCSB(Yahoo Cloud Serving Benchmark).
- Factors examine in benchmarking can be read, write ,read-modify-write latency in executing queries.
Hbase:
- It is a Master -Slave NoSQL database dependent upon Hadoop cluster .
- It is good for Heavy reads and less Write applications
Apache Cassandra:
- It is a master less and shared-nothing, ring based architecture which does not depend upon Hadoop framework.
- It is good for Write heavy and less read applications.
Apache Hive:
- It is a collective processing framework to process the data using a language called Hive Query Language(HQL).
- HQL is a sql wrapper on top of HDFS which protect writing Mapreduce programs in Java.
- Instead one can use SQL like language to do their daily tasks.
- Apache Hive is mainly used for ETL and data warehousing feature in Hadoop.
- The process of hive is to create tables, joins, unions, aggregates etc. It develop visualization reports can be done by Tableau.