Difference between a normal hive table and hive table stored as ORC
Normal hive table | Hive table stored as orc |
---|---|
Hive data is organized into Databases: Namespaces function to avoid naming conflicts for tables, views, partitions, columns, and so on. |
It produce efficient way to store Hive data.It designed to overcome the drawback of others Hive file formats Using Optimized Row Columnar files improves speed when Hive is reading, writing, and processing data. |
Databases can also be used to enforce security for a user or group of users. |
Advantages- Single file as the output of each task, that reduces the NameNode’s load. |
There are two types of tables in Hive, one is Managed table and second is external table. The difference is, when you drop a table, if it is managed table hive deletes both data and meta data, if it is external table Hive only deletes metadata. |
An Optimized Row Columnar file contains types of data define as stripes,along with information in a file footer. At the end of the file a postscript holds compression parameters and the size of the compressed footer. |
It damage how data is loaded, controlled, and managed in Hive ;By use external tables when the data is also used outside of Hive. |
Default stripe size is 250 MB. large stripe sizes enable large, efficient reads from HDFS. |
The data files are read and processed by an existing program that doesn’t lock the files. Data needs to remain in the underlying location even after a DROP |
File footer having a list of stripes and number of rows,each columns contains count, min, max, and sum. |
Storing data in Hive is nothing but storing data in HDFS (Hadoop). |
The stripe file footer contains a directory of stream locations.Row data is used in table scans |