apache hive - Hive Sort By vs Order By - hive tutorial - hadoop hive - hadoop hive - hiveql
- Hive sort by and order by commands are used to fetch data in sorted order. The main differences between sort by and order by commands are given below.
apache hive related article tags - hive tutorial - hadoop hive - hadoop hive - hiveql - hive hadoop - learnhive - hive sql
Sort by
- Hive uses the columns in SORT BY to sort the rows before feeding the rows to a reducer.
- The sort order will be dependent on the column types. If the column is of numeric type, then the sort order is also in numeric order.
- If the column is of string type, then the sort order will be lexicographical order.
- May use multiple reducers for final output.
- Only guarantees ordering of rows within a reducer.
- May give partially ordered result.
Ordering : It orders data at each of ‘N’ reducers , but each reducer can have overlapping ranges of data.
learn hive - hive tutorial - hive sql datatypes - hive programs - hive examples
Order by
- This is similar to ORDER BY in SQL Language.
- In Hive, ORDER BY guarantees total ordering of data, but for that it has to be passed on to a single reducer, which is normally unacceptable and therefore in strict mode, hive makes it compulsory to use LIMIT with ORDER BY so that reducer doesn’t get overburdened.
- Uses single reducer to guarantee total order in output.
- LIMIT can be used to minimize sort time.
Ordering : Total Ordered data.