sqoop - Sqoop Integration - apache sqoop - sqoop tutorial - sqoop hadoop
Sqoop Integration with Hadoop Ecosystem:
- Sqoop Integration is defined as data was moved between RDBMS to HDFS.
- This imported data may further be required code analysed using hive or hbase.
- Sqoop offers property to directly import data to Hive / Hbase.
- Just add "--import-hive" at the end of the command.
Example:
sqoop import \ --connect
"jdbc:mysql://localhost/training" \ --username training -P \ --table cityByCountry \ --target-dir /user/where_clause \ --where "state = 'Alaska'" \--import -hive -m 1
Click "Copy code" button to copy into clipboard - By wikitechy - sqoop tutorial - team

Learn Sqoop - Sqoop tutorial - Sqoop integration - Sqoop examples - Sqoop programs
How to Integrate Sqoop in a Data Ingestion Layer with Hive, Oozie in a Big Data Application:
- Sqoop is a tool which helps to migrate and transfer the data between RDBMS and Hadoop system in bulk mode.
- This blog post will focus on integrating Sqoop with other projects in Hadoop ecosystem and Big Data Applications.
Sqoop related tags : sqoop import , sqoop interview questions , sqoop export , sqoop commands , sqoop user guide , sqoop documentation
Environment
- Java: JDK 1.7
- Cloudera version: CDH4.6
- MySQL.
Initial steps
- Assume that we will have some data on HDFS of our Hadoop cluster and in our MySQL.
- Now we want to integrate jobs with Oozie and Hive.
Code walkthrough
Below is an Oozie workflow which will trigger import job to import the data from MySQL to Hadoop:
<workflow-app name="musqoop-wf" xmlns="uri:oozie:workflow:0.1">
...
<action name="mysqoopaction">
<sqoop xmlns="uri:oozie:mysqoopaction:0.2">
<job-tracker>myjt:8021</job-tracker>
<name-node>nn:8020</name-node>
<command>import --table People <https://datafloq.com/people/?utm=internal> --connect ...</command>
</sqoop>
<ok to="next"/>
<error to="error"/>
</action>
...
</workflow-app>
Click "Copy code" button to copy into clipboard - By wikitechy - sqoop tutorial - team
Sqoop related tags : sqoop import , sqoop interview questions , sqoop export , sqoop commands , sqoop user guide , sqoop documentation
How to add the property for Sqoop when integrating with Oozie?
- This workflow file will help to configure the property to sqoop.
For Example: Add a statement in a transaction when import the data.
<workflow-app name="musqoop-wf" xmlns="uri:oozie:workflow:0.1">
...
<action name="mysqoopaction">
<sqoop xmlns="uri:oozie:mysqoopaction:0.2">
<job-tracker>myjt:8021</job-tracker>
<name-node>nn:8020</name-node>
<configuration>
<property>
<name>sqoop.export.statements.per.transaction</name>
<value>10</value>
</property>
</configuration>
<command>import --table People --connect ...</command>
</sqoop>
<ok to="next"/>
<error to="error"/>
</action>
...
</workflow-app>
Click "Copy code" button to copy into clipboard - By wikitechy - sqoop tutorial - team
- Below command will import the data to Hadoop and into Hive also.
sqoop import \
--connect jdbc:mysql://mysql.example.com/myintegrate \
--username hn \
--password hn \
--table student \
--hive-import
Click "Copy code" button to copy into clipboard - By wikitechy - sqoop tutorial - team
- To Import the data to Hadoop and into Hive also the below command will be helpful, But it will support to partition data only.
sqoop import \
--connect jdbc:mysql://mysql.example.com/myintegrate \
--username hn \
--password hn \
--table student\
--hive-import \
--hive-partition-key day \
--hive-partition-value "2016-09-11"
Click "Copy code" button to copy into clipboard - By wikitechy - sqoop tutorial - team
Verify and Result
- And we can try to trigger the Oozie workflow for first two commands to make sure the XML schemas in workflow files.
- For Hive integration, after the importing job complete, we can try to show table in Hive to make sure the table has data from MySQLshow tables by:
select * from student;
select * from student where LastName =’HN’;
select * from student where LastName like %’HN’%;
select count(*) from student;