Difference between Pig and Sqoop in Hadoop
Pig | Sqoop |
---|---|
Apache Pig is a tool for analytics which is used to analyze data stored in HDFS. |
Apache Sqoop is a tool to importing structured data from RDBMS to HDFS or exporting data from HDFS to RDBMS. |
We can import the data from Sql databases into hive rather than NoSql Databases. |
It can integrate with any external data sources with HDFS i.e Sql , NoSql and Data warehouses as well using this tool at the same time we export it as well since this can be used as bi-directional ways |
Pig can be used for following purposes ETL data pipeline, Research on raw data. |
Important Sqoop control commands to import RDBMS data are Append, Columns and Where |
The pig Metastore stores all info about the tables. And we can execute spark sql queries because spark can interact with pig Metastore. |
Sqoop metastore is a tool for using hosts in a shared metadata repository. Multiple users and remote users can define and execute saved jobs defined in metastore. |
The scalar data types in pig are int, float, double, long, chararray, and bytearray. The complex data types in Pig are map, tuple, and bag. |
It basically converts CHAR(x), VARCHAR(x), NUMERIC(x,y) to string (with lengh 32767), and it converts DATETIME to BIGINT. |