sqoop - Sqoop Vs HDFS - apache sqoop - sqoop tutorial - sqoop hadoop
What is HDFS?
- Hadoop Distributed File System (HDFS) is a distributed file-system that stores data on the commodity machines, and it provides very aggregate bandwidth which is done across the cluster
Learn sqoop - sqoop tutorial - hdfs architecture - sqoop examples - sqoop programs
Difference between Sqoop and HDFS:
Sqoop | HDFS |
---|---|
Sqoop is used for importing data from structured data sources such as RDBMS. |
HDFS is a distributed file system used by Hadoop ecosystem to store data. |
Sqoop has a connector based architecture. Connectors know how to connect to the respective data source and fetch the data. HDFS is a destination for data import using Sqoop. |
HDFS has a distributed architecture where data is distributed across multiple data nodes. HDFS is an ultimate destination for data storage. |
Sqoop allows to Export and Import the data from the data table based on the where clause. |
HDFS just stores data provided to it by whatsoever means. |
In order to import data from structured data sources, one has to use Sqoop only, because its connectors know how to interact with structured data sources and fetch data from them. |
HDFS has its own built-in shell commands to store data into it. HDFS cannot import streaming data |