sqoop - Sqoop with Oracle - apache sqoop - sqoop tutorial - sqoop hadoop

learn sqoop - sqoop tutorial - sqoop2 tutorial - sqoop option text - sqoop job - sqoop code - sqoop programming - sqoop download - sqoop examples

Sqoop with Oracle - Hadoop for off-line analytics

Sqoop with Oracle - Hadoop for RDBMS archive

Sqoop with Oracle - MapReduce results to RDBMS

SQOOP Details

SQOOP import

Divide table into ranges using primary key max/min
Create mappers for each range
Mappers write to multiple HDFS nodes
Creates text or sequence files
Generates Java class for resulting HDFS file
Generates Hive definition and auto-loads into HIVE

SQOOP export

Read files in HDFS directory via MapReduce
Bulk parallel insert into database table

SQOOP features:

Compatible with almost any JDBC enabled database
Auto load into HIVE
Hbase support
Special handling for database LOBs
Job management
Cluster configuration (jar file distribution)
WHERE clause support
Open source, and included in Cloudera distributions

SQOOP fast paths & plug ins

Invoke mysqldump, mysqlimport for MySQL jobs
Similar fast paths for PostgreSQL
Extensibility architecture for 3rd parties (like Quest)
Teradata, Netezza, etc.

Working with Oracle

SQOOP approach is generic and applicable to all RDBMS

However for Oracle, sub-optimal in some respects:

Oracle may parallelize and serialize individual mappers

Oracle optimizer may decline to use index range scans

Oracle physical storage often deliberately not in primary key order (reverse key indexes, hash partitioning, etc)

Primary keys often not be evenly distributed

Index range scans use single block random reads

vs. faster multi-block table scans

Index range scans load into Oracle buffer cache

Pollutes cache increasing IO for other users
Limited help to SQOOP since rows are only read once

Luckily, SQOOP extensibility allows us to add optimizations for specific targets

Oracle – parallelism :

Index range scans

Oracle Ideal architecture

SQOOP/OraOop best practices

Use sequence files for LOBs OR

Set inline-lob-limit

Directly control datanodes for widest destination bandwidth

Can’t rely on mapred.max.maps.per.node

Set number of mappers realistically

Disable speculative execution (our default)

Leads to duplicate DB reads

Set Oracle row fetch size extra high

Keeps the mappers streaming to HDFS

Related Searches to Sqoop with Oracle

sqoop import syntaxsqoop import exampleoracle to hadoopsqoop export hivehow to use sqoopsqoop import oracle table to hivesqoop 2sqoop commandssqoop importhow to start sqoopsplit by in sqoopsqoopapache sqoopsqoop tutorialsqoop hadoopsqoop importsqoop interview questionssqoop exportsqoop commandssqoop user guidesqoop documentationsqoop downloadsqoop import to hivewhat is sqoopsqoop2sqoop jobsqoop exampleapache sqoop tutorialsqoop big datasqoop architecturesqoop import examplesqoop tutorial pdfsqoop import commandsqoop installationsqoop logoapache sqoop cookbooksqoop import to hdfssqoop oraclesqoop 2sqoop import all tableshadoop sqoop tutorialapache sqoop cookbook pdfcloudera sqoopsqoop interview questions and answers for experiencedsqoop vs flumesqoosqoop export examplesqoop sql serversqoop export commandsqoop metastorewhat is sqoop in hadoopinstall sqoop

sqoop - Sqoop with Oracle - apache sqoop - sqoop tutorial - sqoop hadoop

Sqoop with Oracle - Reference data in RDBMS

Sqoop with Oracle - Hadoop for off-line analytics

Sqoop with Oracle - Hadoop for RDBMS archive

Sqoop with Oracle - MapReduce results to RDBMS

SQOOP Details

Working with Oracle

Oracle – parallelism :

Index range scans

Oracle Ideal architecture

SQOOP/OraOop best practices

Related Searches to Sqoop with Oracle

Wikitechy

Workshop

Join our Community

Other Languages

sqoop - Sqoop with Oracle - apache sqoop - sqoop tutorial - sqoop hadoop

Sqoop with Oracle - Reference data in RDBMS

Sqoop with Oracle - Hadoop for off-line analytics

Sqoop with Oracle - Hadoop for RDBMS archive

Sqoop with Oracle - MapReduce results to RDBMS

SQOOP Details

Working with Oracle

Oracle – parallelism :

Index range scans

Oracle Ideal architecture

SQOOP/OraOop best practices

Related Searches to Sqoop with Oracle

Summer Offline Internship

Summer Online Internship

Internship in Chennai

Programming / Technology Internship in Chennai

Wikitechy

Workshop

Join our Community

Other Languages