Sqoop Metastore:
- Sqoop metastore is an internally implemented database that can hold the definition of a given job.
- It includes details like which database,which table,which approach,what query to extract from,what target to write to and so on.
- Consider an array of SQL databases that have the same structure, but are partitioned by for performance reasons.
- Thus we have the same sqoop job running against multiple database instances at the same time, all unified at a single table in Hadoop.
- In order to get parallelization, we generate a job that manages the Sqoop sub-workflows. This results in 500 distinct sqoop jobs.
- Metastore is used to handle these complexities.