sqoop - Sqoop with Oracle - apache sqoop - sqoop tutorial - sqoop hadoop
Sqoop with Oracle - Reference data in RDBMS
![learn sqoop - sqoop tutorial - sqoop2 tutorial - sqoop option text - sqoop job - sqoop code - sqoop programming - sqoop download - sqoop examples](https://wikitechy.com/tutorials/sqoop/img/sqoop-images/reference-data-into-rdbms.png)
Sqoop with Oracle - Hadoop for off-line analytics
![learn sqoop - sqoop tutorial - sqoop2 tutorial - sqoop option text - sqoop job - sqoop code - sqoop programming - sqoop download - sqoop examples](https://wikitechy.com/tutorials/sqoop/img/sqoop-images/hadoop-for-off-line-analytics.png)
Sqoop with Oracle - Hadoop for RDBMS archive
![learn sqoop - sqoop tutorial - sqoop2 tutorial - sqoop option text - sqoop job - sqoop code - sqoop programming - sqoop download - sqoop examples](https://wikitechy.com/tutorials/sqoop/img/sqoop-images/hadoop-for-rdbms.png)
Sqoop with Oracle - MapReduce results to RDBMS
![learn sqoop - sqoop tutorial - sqoop2 tutorial - sqoop option text - sqoop job - sqoop code - sqoop programming - sqoop download - sqoop examples](https://wikitechy.com/tutorials/sqoop/img/sqoop-images/mapreduce-results-to-rdbms.png)
SQOOP Details
- Divide table into ranges using primary key max/min
- Create mappers for each range
- Mappers write to multiple HDFS nodes
- Creates text or sequence files
- Generates Java class for resulting HDFS file
- Generates Hive definition and auto-loads into HIVE
- Read files in HDFS directory via MapReduce
- Bulk parallel insert into database table
- Compatible with almost any JDBC enabled database
- Auto load into HIVE
- Hbase support
- Special handling for database LOBs
- Job management
- Cluster configuration (jar file distribution)
- WHERE clause support
- Open source, and included in Cloudera distributions
- Invoke mysqldump, mysqlimport for MySQL jobs
- Similar fast paths for PostgreSQL
- Extensibility architecture for 3rd parties (like Quest)
- Teradata, Netezza, etc.
Working with Oracle
- vs. faster multi-block table scans
- Pollutes cache increasing IO for other users
- Limited help to SQOOP since rows are only read once
Oracle – parallelism :
![learn sqoop - sqoop tutorial - sqoop2 tutorial - sqoop option text - sqoop job - sqoop code - sqoop programming - sqoop download - sqoop examples](https://wikitechy.com/tutorials/sqoop/img/sqoop-images/sqoop-oracle-parallelism.png)
![learn sqoop - sqoop tutorial - sqoop2 tutorial - sqoop option text - sqoop job - sqoop code - sqoop programming - sqoop download - sqoop examples](https://wikitechy.com/tutorials/sqoop/img/sqoop-images/oracle-parallelism.png)
![learn sqoop - sqoop tutorial - sqoop2 tutorial - sqoop option text - sqoop job - sqoop code - sqoop programming - sqoop download - sqoop examples](https://wikitechy.com/tutorials/sqoop/img/sqoop-images/sqoop-oracle-data-parallelism.png)
Index range scans
![learn sqoop - sqoop tutorial - sqoop2 tutorial - sqoop option text - sqoop job - sqoop code - sqoop programming - sqoop download - sqoop examples](https://wikitechy.com/tutorials/sqoop/img/sqoop-images/oracle-index-scans.png)
Oracle Ideal architecture
![learn sqoop - sqoop tutorial - sqoop2 tutorial - sqoop option text - sqoop job - sqoop code - sqoop programming - sqoop download - sqoop examples](https://wikitechy.com/tutorials/sqoop/img/sqoop-images/oracle-ideal-architecture.png)
SQOOP/OraOop best practices
- Set inline-lob-limit
- Can’t rely on mapred.max.maps.per.node
- Leads to duplicate DB reads
- Keeps the mappers streaming to HDFS