sqoop - How Sqoop Works - apache sqoop - sqoop tutorial - sqoop hadoop
How Sqoop Works?
- Sqoop provides a pluggable mechanism for optimal connectivity to external systems.
- The Sqoop extension API provides a convenient framework for building new connectors which can be dropped into Sqoop installations to provide connectivity to various systems.
- Sqoop itself comes bundled with various connectors that can be used for popular database and data warehousing systems.
Learn sqoop - sqoop tutorial - sqoop illustration - sqoop examples - sqoop programs
The above diagram explains how sqoop uses the import command internally to retrieve data from other databases.
Import Data:
Command used for importing data (Employee table) from other MYSQL databases:
- import: command that tells Sqoop to initiate an import.
- connect <connect string>, -username <user name >, -password <password >: connection parameters used to connect with the database.
- table <table name> : table to be imported.
Learn sqoop - sqoop tutorial - how sqoop works - sqoop examples - sqoop programs
Step 1
- sqoop analyses database and frames metadata in order for the data to be retrieved.
Step 2
- In the next step it uses map job to transfer data to HDFS using the metadata formed in the first step.
- Data will be imported to the HDFS system under a file related to the retrieved table.
- Sqoop also provides features to specify the file name in the HDFS system.
Step 3
- The file created by HDFS will have comma separated values for each column and each row will be placed in a new line.
- Based on the size of the table, HDFS may create one or more files. Sqoop provisions various format for importing data, for eg: using option -as -avrodatafile will import in avro format.
Sqoop related tags : sqoop import , sqoop interview questions , sqoop export , sqoop commands , sqoop user guide , sqoop documentation
Export Data:
- The Export Command used for exporting data (Employee table) to MYSQL database:
- export: command that tells Sqoop to initiate an export.
- connect <connect string>, -username <user name >, -password <password >: connection parameters used to connect with the database.
- table <table name>: table to be loaded with data.
- export-dir <directory path>: HDFS directory from where data will be exported.
- sqoop gathers meta data for which it should export data and then it performs a map job which transfers data.
- Sqoop performs this transfer by splitting the input data set into various map tasks and sends it to the database.
- Map job implements this transfer over several transactions to ensure optimal throughput and minimal resource utilization.
- Some of the connectors in sqoop facilitate a staging table concept where the data set will be transferred to a staging table first and then to a target table.
- This prevents data loss in case of a failure so the job can be repeated again if there is a failure.
How to run sqoop :
- TOOL - indicates the operation that you want to perform, e.g import, export etc
- PROPERTY_ARGS - are a set of parameters that are entered as
- Java properties in the format -Dname=value. SQOOP_ARGS - all the various sqoop parameters.