Difference between ‘select * from table’ and ‘select column from table’ in hive
- Table in Hive is stored as a directory in the HDFS.
- Using select from table the Hive query processor simply goes directory that have one or more files in table schema.
- You may do this if you have very small data like less than a Gigabyte.
- In real clusters if you hit ‘select * from table’, it may have data in Terabytes and displaying that will run for long time.
- Hive achieved sequence of map reduce programs that reads data from table stored on Hadoop Distributed File System.
- Any data processing you do in Hive is achieved through sequence of map reduce programs that reads data from table stored on HDFS.
- Hive map reduce based on query processing engine.
- Tables have wide number of columns that representing different values.To perform select column the map reduce program will scan all rows and extract a column.