Section 10:119.Sorting Data within Groups Using DISTRIBUTE BY and SORT BY

Sorting data within groups using DISTRIBUTE BY and SORT BY in Hive is a way to sort the data within each group based on one or more columns. This is useful when you want to group your data and sort it within each group to get more meaningful insights.

Use the SELECT statement with DISTRIBUTE BY and SORT BY clauses to group and sort the data.

By default, the sorting is done in ascending order. You can specify descending order by adding the DESC keyword after the column name.

You can also limit the number of rows returned for each group by using the CLUSTER BY clause instead of DISTRIBUTE BY.

Share this post