Workflow when using Pig & R
- Pig programming language is used for obtaining and manipulating data perhaps doing otherwise with UDFs.
- Pig is used for filtering, cleaning, transforming and otherwise preparing large data sets.
- Depending on the workflow, user might throw a Hive table on top of the Pig output and then use Hive to prepare subsets of data for R.
- R is used for analysis and modeling and it’s also act as prototype models.
- There are ways to connect directly to Hive from R using the Hive JDBC drivers.
- We use R for early data exploration and examination of results.
- In a normal workflow, Pig is used to perform the ETL(Extract, Transform, Load) functionality and transform it load the data into HDFS and analysts utilizes HIVE or extensions like RHIVE to perform the analysis on large volume of data available in HDFS.
- The Comparison between Pig & R is invalid for ever because the pig is for large scale data and R definitely is not.