sqoop - sqoop commands - Basic Commands and Syntax for Sqoop - apache sqoop - sqoop tutorial - sqoop hadoop
Sqoop Commands - Sqoop-Import:
- Sqoop import command imports a table from an RDBMS to HDFS.
- Each record from a table is considered as a separate record in HDFS.
- Records can be stored as text files, or in binary representation as Avro or SequenceFiles.
Generic Syntax:
The Hadoop specific generic arguments must precede any import arguments, and the import arguments can be of any order.
Importing a Table into HDFS
Syntax:
--connect - Takes JDBC url and connects to database --table - Source table name to be imported --username - Username to connect to database --password - Password of the connecting user --target-dir - Imports data to the specified directory
Importing Selected Data from Table:
Syntax:
--columns - Selects subset of columns --where - Retrieves the data which satisfies the condition
Importing Data from Query
Syntax:
--query Executes the SQL query provided and imports the results
Incremental Exports
Syntax:
Sqoop related tags : sqoop import , sqoop interview questions , sqoop export , sqoop commands , sqoop user guide , sqoop documentation
Sqoop import supports two types of incremental imports:
- Append
- Lastmodified.
Append mode is to be used when new rows are continually being added with increasing values. Column should also be specified which is continually increasing with --check-column. Sqoop imports rows whose value is greater than the one specified with --last-value. Lastmodified mode is to be used when records of the table might be updated, and each such update will set the current timestamp value to a last-modified column. Records whose check column timestamp is more recent than the timestamp specified with --last-value are imported.
Sqoop related tags : sqoop import , sqoop interview questions , sqoop export , sqoop commands , sqoop user guide , sqoop documentation
Notes:
- In JDBC connection string, database host shouldn't be used as “localhost” as Sqoop launches mappers on multiple data nodes and the mapper will not able to connect to DB host.
- “-password” parameter is insecure as any one can read it from command line. -P option can be used, which prompts for password in console. Otherwise, it is recommended to use -password-file pointing to the file containing password (Make sure you have revoked permission to unauthorized users).
Few arguments helpful with Sqoop import:
Argument | Description |
---|---|
--num-mappers,-m | Mappers to Launch |
--fields-terminated-by | Field Separator |
--lines-terminated-by | End of line seprator |
Sqoop related tags : sqoop import , sqoop interview questions , sqoop export , sqoop commands , sqoop user guide , sqoop documentation
Importing Data into Hive
Below mentioned Hive arguments is used with the sqoop import command to directly load data into Hive:
Argument | Description |
---|---|
--hive-home | Override $HIVE_HOME path |
--hive-import | Import tables into Hive |
--hive-overwrite | Overwrites existing Hive table data |
--create-hive-table | Creates Hive table and fails if that table already exists |
--hive-table | Sets the Hive table name to import |
--hive-drop-import-delims | Drops delimiters like\n, \r, and \01 from string fields |
--hive-delims-replacement | Replaces delimiters like \n, \r, and \01 from string fields with user defined delimiters |
--hive-partition-key | Sets the Hive partition key |
--hive-partition-value | Sets the Hive partition value |
--map-column-hive | Overrides default mapping from SQL type datatypes to Hive datatypes |
Syntax:
Specifying --hive-import, Sqoop imports data into Hive table rather than HDFS directory.
Importing Data into HBase
Below mentioned HBase arguments is used with the sqoop import command to directly load data into HBase:
Argument | Description |
---|---|
--column-family | Sets column family for the import |
--hbase-create-table | If specified, creates missing HBase tables and fails if already exists |
--hbase-row-key | Specifies which column to use as the row key |
--hbase-table | Imports to Hbase table |
Syntax:
Specifying -hbase-table, Sqoop will import data into HBase rather than HDFS directory.
Sqoop-Import-all-Tables
The import-all-tables imports all tables in a RDBMS database to HDFS. Data from each table is stored in a separate directory in HDFS. Following conditions must be met in order to use sqoop-import-all-tables:
- Each table should have a single-column primary key.
- You should import all columns of each table.
- You should not use splitting column, and should not check any conditions using where clause.
Generic Syntax:
Sqoop specific arguments are similar with sqoop-import tool, but few options like --table, --split-by, --columns, and --where arguments are invalid.
Syntax:
Sqoop-Export
Sqoop export command exports a set of files in a HDFS directory back to RDBMS tables. The target table should already exist in the database.
Generic Syntax:
Sqoop export command prepares INSERT statements with set of input data then hits the database. It is for exporting new records, If the table has unique value constant with primary key, export job fails as the insert statement fails. If you have updates, you can use --update-key option. Then Sqoop prepares UPDATE statement which updates the existing row, not the INSERT statements as earlier.
Syntax:
Sqoop-Job
Sqoop job command allows us to create a job. Job remembers the parameters used to create job, so they can be invoked any time with same arguments.
Generic Syntax:
Sqoop-job makes work easy when we are using incremental import. The last value imported is stored in the job configuration of the sqoop-job, so for the next execution it directly uses from configuration and imports the data.
Sqoop-job options:
Argument | Description |
---|---|
--create | Defines a new job with the specified job-id (name). Actual sqoop import command should be seperated by “--“ |
--delete | Deletes a saved job. |
--exec | Executes the saved job. |
--show | Show the save job configuration |
--list | Lists all the saved jobs |
Syntax:
Sqoop-Codegen
Sqoop-codegen command generates Java class files which encapsulate and interpret imported records. The Java definition of a record is initiated as part of the import process. For example, if Java source is lost, it can be recreated. New versions of a class can be created which use different delimiters between fields, and so on.
Generic Syntax:
Syntax:
Sqoop-Eval
Sqoop-eval command allows users to quickly run simple SQL queries against a database and the results are printed on to the console.
Generic Syntax:
Syntax:
Using this, users can be sure that they are importing the data as expected.
Sqoop-List-Database
Used to list all the database available on RDBMS server.
Generic Syntax:
Syntax:
Sqoop-List-Tables
Used to list all the tables in a specified database.