Analytic Functions – Ranking

Topic
Materials

Let us see how we can assign ranks using different rank functions.

  • If we have to assign ranks globally, we just need to specify ORDER BY

  • If we have to assign ranks with in a key then we need to specify PARTITION BY and then ORDER BY.

  • By default ORDER BY will sort the data in ascending order. We can change the order by passing DESC after order by.

  • We have 3 main functions to assign ranks - rankdense_rank and row_number. We will see the differences between the 3 in a moment.

%load_ext sql

Copy to clipboard

The sql extension is already loaded. To reload it, use:
  %reload_ext sql

Copy to clipboard

%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

Copy to clipboard

env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

Copy to clipboard

Note

Here is an example to assign sparse ranks using daily_product_revenue with in each day based on revenue.

%%sql

SELECT t.*,
    rank() OVER (
        PARTITION BY order_date
        ORDER BY revenue DESC
    ) AS rnk
FROM daily_product_revenue t
ORDER BY order_date, revenue DESC
LIMIT 30

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
30 rows affected.

Copy to clipboard

order_date order_item_product_id revenue rnk
2013-07-25 00:00:00 1004 5599.72 1
2013-07-25 00:00:00 191 5099.49 2
2013-07-25 00:00:00 957 4499.70 3
2013-07-25 00:00:00 365 3359.44 4
2013-07-25 00:00:00 1073 2999.85 5
2013-07-25 00:00:00 1014 2798.88 6
2013-07-25 00:00:00 403 1949.85 7
2013-07-25 00:00:00 502 1650.00 8
2013-07-25 00:00:00 627 1079.73 9
2013-07-25 00:00:00 226 599.99 10
2013-07-25 00:00:00 24 319.96 11
2013-07-25 00:00:00 821 207.96 12
2013-07-25 00:00:00 625 199.99 13
2013-07-25 00:00:00 705 119.99 14
2013-07-25 00:00:00 572 119.97 15
2013-07-25 00:00:00 666 109.99 16
2013-07-25 00:00:00 725 108.00 17
2013-07-25 00:00:00 134 100.00 18
2013-07-25 00:00:00 906 99.96 19
2013-07-25 00:00:00 828 95.97 20
2013-07-25 00:00:00 810 79.96 21
2013-07-25 00:00:00 924 79.95 22
2013-07-25 00:00:00 926 79.95 22
2013-07-25 00:00:00 93 74.97 24
2013-07-25 00:00:00 835 63.98 25
2013-07-25 00:00:00 897 49.98 26
2013-07-26 00:00:00 1004 10799.46 1
2013-07-26 00:00:00 365 7978.67 2
2013-07-26 00:00:00 957 6899.54 3
2013-07-26 00:00:00 191 6799.32 4

Note

Here is another example to assign sparse ranks using employees data set with in each department. Make sure to restart kernel as you might have connected to retail database.

%load_ext sql

Copy to clipboard

%env DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db

Copy to clipboard

env: DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db

Copy to clipboard

%%sql

SELECT employee_id, department_id, salary FROM employees 
ORDER BY department_id,
    salary DESC
LIMIT 10

Copy to clipboard

10 rows affected.

Copy to clipboard

employee_id department_id salary
200 10 4400.00
201 20 13000.00
202 20 6000.00
114 30 11000.00
115 30 3100.00
116 30 2900.00
117 30 2800.00
118 30 2600.00
119 30 2500.00
203 40 6500.00
%%sql

SELECT employee_id, department_id, salary,
    rank() OVER (
        PARTITION BY department_id 
        ORDER BY salary DESC
    ) AS rnk
FROM employees
LIMIT 20

Copy to clipboard

 * postgresql://itversity_hr_user:***@localhost:5432/itversity_hr_db
20 rows affected.

Copy to clipboard

employee_id department_id salary rnk
200 10 4400.00 1
201 20 13000.00 1
202 20 6000.00 2
114 30 11000.00 1
115 30 3100.00 2
116 30 2900.00 3
117 30 2800.00 4
118 30 2600.00 5
119 30 2500.00 6
203 40 6500.00 1
121 50 8200.00 1
120 50 8000.00 2
122 50 7900.00 3
123 50 6500.00 4
124 50 5800.00 5
184 50 4200.00 6
185 50 4100.00 7
192 50 4000.00 8
193 50 3900.00 9
188 50 3800.00 10

Note

Here is an example to assign dense ranks using employees data set with in each department.

%%sql

SELECT employee_id, department_id, salary,
    dense_rank() OVER (
        PARTITION BY department_id 
        ORDER BY salary DESC
    ) AS drnk
FROM employees
LIMIT 20

Copy to clipboard

 * postgresql://itversity_hr_user:***@localhost:5432/itversity_hr_db
20 rows affected.

Copy to clipboard

employee_id department_id salary drnk
200 10 4400.00 1
201 20 13000.00 1
202 20 6000.00 2
114 30 11000.00 1
115 30 3100.00 2
116 30 2900.00 3
117 30 2800.00 4
118 30 2600.00 5
119 30 2500.00 6
203 40 6500.00 1
121 50 8200.00 1
120 50 8000.00 2
122 50 7900.00 3
123 50 6500.00 4
124 50 5800.00 5
184 50 4200.00 6
185 50 4100.00 7
192 50 4000.00 8
193 50 3900.00 9
188 50 3800.00 10

Note

Here is an example for global rank based on salary. If all the salaries are unique, we can use LIMIT but when they are not unique, we have to go with analytic functions.

%%sql

SELECT employee_id, department_id, salary,
    rank() OVER (
        ORDER BY salary DESC
    ) AS rnk,
    dense_rank() OVER (
        ORDER BY salary DESC
    ) AS drnk
FROM employees
LIMIT 20

Copy to clipboard

 * postgresql://itversity_hr_user:***@localhost:5432/itversity_hr_db
20 rows affected.

Copy to clipboard

employee_id department_id salary rnk drnk
100 90 24000.00 1 1
101 90 17000.00 2 2
102 90 17000.00 2 2
145 80 14000.00 4 3
146 80 13500.00 5 4
201 20 13000.00 6 5
205 110 12000.00 7 6
147 80 12000.00 7 6
108 100 12000.00 7 6
168 80 11500.00 10 7
148 80 11000.00 11 8
174 80 11000.00 11 8
114 30 11000.00 11 8
149 80 10500.00 14 9
162 80 10500.00 14 9
169 80 10000.00 16 10
204 70 10000.00 16 10
150 80 10000.00 16 10
156 80 10000.00 16 10
170 80 9600.00 20 11

Let us understand the difference between rankdense_rank and row_number.

  • We can use either of the functions to generate ranks when the rank field does not have duplicates.

  • When rank field have duplicates then row_number should not be used as it generate unique number for each record with in the partition.

  • rank will skip the ranks in between if multiple people get the same rank while dense_rank continue with the next number.

%load_ext sql

Copy to clipboard

The sql extension is already loaded. To reload it, use:
  %reload_ext sql

Copy to clipboard

%env DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db

Copy to clipboard

env: DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db

Copy to clipboard

%%sql

SELECT
    employee_id,
    department_id,
    salary,
    rank() OVER (
        PARTITION BY department_id
        ORDER BY salary DESC
      ) rnk,
    dense_rank() OVER (
        PARTITION BY department_id
        ORDER BY salary DESC
      ) drnk,
    row_number() OVER (
        PARTITION BY department_id
        ORDER BY salary DESC, employee_id
      ) rn
FROM employees
ORDER BY department_id, salary DESC
LIMIT 50

Copy to clipboard

 * postgresql://itversity_hr_user:***@localhost:5432/itversity_hr_db
50 rows affected.

Copy to clipboard

employee_id department_id salary rnk drnk rn
200 10 4400.00 1 1 1
201 20 13000.00 1 1 1
202 20 6000.00 2 2 2
114 30 11000.00 1 1 1
115 30 3100.00 2 2 2
116 30 2900.00 3 3 3
117 30 2800.00 4 4 4
118 30 2600.00 5 5 5
119 30 2500.00 6 6 6
203 40 6500.00 1 1 1
121 50 8200.00 1 1 1
120 50 8000.00 2 2 2
122 50 7900.00 3 3 3
123 50 6500.00 4 4 4
124 50 5800.00 5 5 5
184 50 4200.00 6 6 6
185 50 4100.00 7 7 7
192 50 4000.00 8 8 8
193 50 3900.00 9 9 9
188 50 3800.00 10 10 10
137 50 3600.00 11 11 11
189 50 3600.00 11 11 12
141 50 3500.00 13 12 13
186 50 3400.00 14 13 14
129 50 3300.00 15 14 15
133 50 3300.00 15 14 16
125 50 3200.00 17 15 17
138 50 3200.00 17 15 18
180 50 3200.00 17 15 19
194 50 3200.00 17 15 20
142 50 3100.00 21 16 21
181 50 3100.00 21 16 22
196 50 3100.00 21 16 23
187 50 3000.00 24 17 24
197 50 3000.00 24 17 25
134 50 2900.00 26 18 26
190 50 2900.00 26 18 27
130 50 2800.00 28 19 28
183 50 2800.00 28 19 29
195 50 2800.00 28 19 30
126 50 2700.00 31 20 31
139 50 2700.00 31 20 32
143 50 2600.00 33 21 33
198 50 2600.00 33 21 34
199 50 2600.00 33 21 35
131 50 2500.00 36 22 36
140 50 2500.00 36 22 37
144 50 2500.00 36 22 38
182 50 2500.00 36 22 39
191 50 2500.00 36 22 40

Note

Here is another example to with respect to all 3 functions. Make sure to restart kernel as you might have connected to HR database.

%load_ext sql

Copy to clipboard

%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

Copy to clipboard

env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

Copy to clipboard

%%sql

SELECT
    t.*,
    rank() OVER (
        PARTITION BY order_date
        ORDER BY revenue DESC
    ) rnk,
    dense_rank() OVER (
        PARTITION BY order_date
        ORDER BY revenue DESC
    ) drnk,
    row_number() OVER (
        PARTITION BY order_date
        ORDER BY revenue DESC
    ) rn
FROM daily_product_revenue AS t
ORDER BY order_date, revenue DESC
LIMIT 30

Copy to clipboard

30 rows affected.

Copy to clipboard

order_date order_item_product_id revenue rnk drnk rn
2013-07-25 00:00:00 1004 5599.72 1 1 1
2013-07-25 00:00:00 191 5099.49 2 2 2
2013-07-25 00:00:00 957 4499.70 3 3 3
2013-07-25 00:00:00 365 3359.44 4 4 4
2013-07-25 00:00:00 1073 2999.85 5 5 5
2013-07-25 00:00:00 1014 2798.88 6 6 6
2013-07-25 00:00:00 403 1949.85 7 7 7
2013-07-25 00:00:00 502 1650.00 8 8 8
2013-07-25 00:00:00 627 1079.73 9 9 9
2013-07-25 00:00:00 226 599.99 10 10 10
2013-07-25 00:00:00 24 319.96 11 11 11
2013-07-25 00:00:00 821 207.96 12 12 12
2013-07-25 00:00:00 625 199.99 13 13 13
2013-07-25 00:00:00 705 119.99 14 14 14
2013-07-25 00:00:00 572 119.97 15 15 15
2013-07-25 00:00:00 666 109.99 16 16 16
2013-07-25 00:00:00 725 108.00 17 17 17
2013-07-25 00:00:00 134 100.00 18 18 18
2013-07-25 00:00:00 906 99.96 19 19 19
2013-07-25 00:00:00 828 95.97 20 20 20
2013-07-25 00:00:00 810 79.96 21 21 21
2013-07-25 00:00:00 924 79.95 22 22 22
2013-07-25 00:00:00 926 79.95 22 22 23
2013-07-25 00:00:00 93 74.97 24 23 24
2013-07-25 00:00:00 835 63.98 25 24 25
2013-07-25 00:00:00 897 49.98 26 25 26
2013-07-26 00:00:00 1004 10799.46 1 1 1
2013-07-26 00:00:00 365 7978.67 2 2 2
2013-07-26 00:00:00 957 6899.54 3 3 3
2013-07-26 00:00:00 191 6799.32 4 4 4