apache hive - HiveQL Select Order By - hive tutorial - hadoop hive - hadoop hive - hiveql
What is Select-Order By in HiveQL ?
- The familiar ORDER BY clause of a SELECT statement sorts the result set based on the values from one or more columns.
- The ORDER BY clause is used to retrieve the details based on one column and sort the result set by ascending or descending order.
- For distributed queries, this is a relatively expensive operation, because the entire result set must be produced and transferred to one node before the sorting can happen. This can require more memory capacity than a query without ORDER BY.
apache hive related article tags - hive tutorial - hadoop hive - hadoop hive - hiveql - hive hadoop - learnhive - hive sql
Syntax of Order By:
- The ORDER BY syntax in Hive QL is similar to the syntax of ORDER BY in SQL language.
- There are some limitations in the "order by" clause.
- In the strict mode (i.e., hive.mapred.mode=strict), the order by clause has to be followed by a "limit" clause.
- The limit clause is not necessary if you set hive.mapred.mode to nonstrict.
- The reason is that in order to impose total order of all results, there has to be one reducer to sort the final output.
- If the number of rows in the output is too large, the single reducer could take a very long time to finish.
- Note that columns are specified by name, not by position number. However in Hive 0.11.0 and later, columns can be specified by position when configured as follows:
- For Hive 0.11.0 through 2.1.x, set hive.groupby.orderby.position.alias to true (the default is false).
- For Hive 2.2.0 and later, hive.orderby.position.alias is true by default.
learn hive - hive tutorial - hive sql datatypes - hive programs - hive examples
The default sorting order is ascending (ASC):
- In Hive 2.1.0 and later, specifying the null sorting order for each of the columns in the "order by" clause is supported. The default null sorting order for ASC order is NULLS FIRST, while the default null sorting order for DESC order is NULLS LAST.
- In Hive 3.0.0 and later, order by without limit in subqueries and views will be removed by the optimizer. To disable it, set hive.remove.orderby.in.subquery to false.
Example:
- Let us take an example for SELECT...ORDER BY clause.
- Assume employee table as given below, with the fields named Id, Name, Salary, Designation, and Dept.
- Generate a query to retrieve the employee details in order by using Department name.
ID | Name | Salary | Designation | Dept |
---|---|---|---|---|
1201 | Aarthi | 45000 | Technical manager | TP |
1202 | Boomi | 45000 | Proofreader | PR |
1203 | Dharsanya | 40000 | Technical writer | TP |
1204 | Harikka | 40000 | Hr Admin | HR |
1205 | Mirunalini | 30000 | Op Admin | Admin |
- The following query retrieves the employee details using the above scenario:
- On successful execution of the query, you get to see the following response:
ID | Name | Salary | Designation | Dept |
---|---|---|---|---|
1205 | Mirunalini | 30000 | Op Admin | Admin |
1204 | Harikka | 40000 | Hr Admin | HR |
1202 | Boomi | 45000 | Proofreader | PR |
1201 | Aarthi | 45000 | Technical manager | TP |
1203 | Dharsanya | 40000 | Technical writer | TP |
apache hive related article tags - hive tutorial - hadoop hive - hadoop hive - hiveql - hive hadoop - learnhive - hive sql
JDBC Program:
Here is the JDBC program to apply Order By clause for the given example.
- Save the program in a file named WikitechyOrderBy.java. Use the following commands to compile and execute this program.
Output:
ID | Name | Salary | Designation | Dept |
---|---|---|---|---|
1205 | Mirunalini | 30000 | Op Admin | Admin |
1204 | Harikka | 40000 | Hr Admin | HR |
1202 | Boomi | 45000 | Proofreader | PR |
1201 | Aarthi | 45000 | Technical manager | TP |
1203 | Dharsanya | 40000 | Technical writer | TP |
1204 | Harikka | 40000 | Hr Admin | HR |