Data Hive
- Data Hive is a data warehouse software project built on top of Apache Hadoop for providing query, and analysis.
- Hive gives SQL like interface to query data stored in different databases and file systems that integrate with Hadoop.
Data Processing Task
- Download the data
- Upload the data
- Start the hive view
-
- Explore the hive user interface(UI)
- Create table temp_drivers
- Create query to populate hive table temp_drivers with drivers.csv data
- Create table drivers
- To create query for extract data from temp_drivers and store it to drivers.
- Create temp_timesheet and timesheet tables.
- For filter the data (driverid, hours_logged, miles_logged).
- For join the data (driverid, name, hours_logged, miles_logged).