sqoop - Creating Hive Partitioned Tables Using Sqoop - apache sqoop - sqoop tutorial - sqoop hadoop
How to create Hive partitioned tables using Sqoop?
- Sqoop is used to bring data from RDBMS, but there is a limitation of sqoop is that data which is stored in HDFS is stored in one folder.
- If there is a partitioned table needs to be created in Hive for further queries, then the users need to create Hive script to distribute data to the appropriate partitions.
- Hence there is no direct option of creating partition tables which is done based on Hive which is done directly from sqoop.
- We can use sqoop features of putting the output in a specific directory to simulate a partitioned table structure which is done in HDFS.
- If any partitioned table have a HDFS structure where each partition is <table name>/<partition column name=value> hence, we can use the following sqoop structure to select appropriate data which is done for each partition and we need to move it to the correct HDFS structure.
Syntax:
- Now, we need an external HIVE table which can be created which is pointing to /home/user1/table1 directory with which is done with partition column as pt.
Example
sqoop to hive partition
Step 1
- create table in mysql with 4 fields (id,name,age,sex)
Step 2
- Insert data into mysql table using csv abc.csv
Advantages
- It is said to be Independent of the source table partition structure. Hence the Source table may or may not be partitioned.
- It can be extended to use cases where the Hive partitioning is done based on the multiple columns.
- Hive table partitioning scheme which is given can be different from the source table partitioning scheme.
- Multiple sqoop are required the Hive table creation script which can be combined in one script to allow creation of any Hive partitioned table which is done from RDBMS.
- It will allow us to get data to HDFS in a structure which is appropriate for creating partitioned HIVE table.