Solutions – SQL Analytics or Windowing Functions using Postgresql

Analytics Functions

Let us take care of the exercises related to analytics functions. We will be using HR database for the same.

  • Get all the employees who is making more than average salary with in each department.
  • Get cumulative salary for one of the department along with department name.
  • Get top 3 paid employees with in each department by salary (use dense_rank)
  • Get top 3 products sold in the month of 2014 January by revenue.
  • Get top 3 products in each category sold in the month of 2014 January by revenue.

Prepare HR Database

Here are the steps to prepare HR database.

  • Connect to HR DB using psql or SQL Workbench. Here is the sample psql command.
psql -h localhost \
    -p 5432 \
    -d itversity_hr_db \
    -U itversity_hr_user \
    -W
  • Run scripts to create tables and load the data. You can also drop the tables if they already exists.
\i /data/hr_db/drop_tables_pg.sql
\i /data/hr_db/create_tables_pg.sql
\i /data/hr_db/load_data_pg.sql
  • Validate to ensure that data is available in the tables by running these queries.
In [1]:
%load_ext sql
In [2]:
%env DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db
env: DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db
In [3]:
%sql SELECT * FROM employees LIMIT 10
10 rows affected.
Out[3]:
employee_id first_name last_name email phone_number hire_date job_id salary commission_pct manager_id department_id
100 Steven King SKING 515.123.4567 1987-06-17 AD_PRES 24000.00 None None 90
101 Neena Kochhar NKOCHHAR 515.123.4568 1989-09-21 AD_VP 17000.00 None 100 90
102 Lex De Haan LDEHAAN 515.123.4569 1993-01-13 AD_VP 17000.00 None 100 90
103 Alexander Hunold AHUNOLD 590.423.4567 1990-01-03 IT_PROG 9000.00 None 102 60
104 Bruce Ernst BERNST 590.423.4568 1991-05-21 IT_PROG 6000.00 None 103 60
105 David Austin DAUSTIN 590.423.4569 1997-06-25 IT_PROG 4800.00 None 103 60
106 Valli Pataballa VPATABAL 590.423.4560 1998-02-05 IT_PROG 4800.00 None 103 60
107 Diana Lorentz DLORENTZ 590.423.5567 1999-02-07 IT_PROG 4200.00 None 103 60
108 Nancy Greenberg NGREENBE 515.124.4569 1994-08-17 FI_MGR 12000.00 None 101 100
109 Daniel Faviet DFAVIET 515.124.4169 1994-08-16 FI_ACCOUNT 9000.00 None 108 100
In [4]:
%%sql 

SELECT * FROM departments 
ORDER BY manager_id NULLS LAST
LIMIT 10
 * postgresql://itversity_hr_user:***@localhost:5432/itversity_hr_db
10 rows affected.
Out[4]:
department_id department_name manager_id location_id
90 Executive 100 1700
60 IT 103 1400
100 Finance 108 1700
30 Purchasing 114 1700
50 Shipping 121 1500
80 Sales 145 2500
10 Administration 200 1700
20 Marketing 201 1800
40 Human Resources 203 2400
70 Public Relations 204 2700

Prepare Retail Database

Make sure to drop and recreate the tables before taking care of the exercises related to retail database.

  • Ensure that we have required database and user for retail data. We might provide the database as part of our labs. Here are the instructions to use psql for setting up the required database (if required) and tables.
psql -U postgres -h localhost -p 5432 -W
CREATE DATABASE itversity_retail_db;
CREATE USER itversity_retail_user WITH ENCRYPTED PASSWORD 'retail_password';
GRANT ALL ON DATABASE itversity_retail_db TO itversity_retail_user;
  • Create Tables using the script provided. You can either use psql or SQL Workbench.
psql -U itversity_retail_user \
  -h localhost \
  -p 5432 \
  -d itversity_retail_db \
  -W
  • You can drop the existing tables.
DROP TABLE IF EXISTS order_items CASCADE;
DROP TABLE IF EXISTS orders CASCADE;
DROP TABLE IF EXISTS customers CASCADE;
DROP TABLE IF EXISTS products CASCADE;
DROP TABLE IF EXISTS categories CASCADE;
DROP TABLE IF EXISTS departments CASCADE;
  • Once the tables are dropped you can run below script to create the tables for the purpose of exercises.
\i /data/retail_db/create_db_tables_pg.sql
  • Data shall be loaded using the script provided.
\i /data/retail_db/load_db_tables_pg.sql
  • Run queries to validate we have data in all the 3 tables.
In [1]:
%load_ext sql
In [2]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
In [3]:
%sql SELECT * FROM orders LIMIT 10
10 rows affected.
Out[3]:
order_id order_date order_customer_id order_status
1 2013-07-25 00:00:00 11599 CLOSED
2 2013-07-25 00:00:00 256 PENDING_PAYMENT
3 2013-07-25 00:00:00 12111 COMPLETE
4 2013-07-25 00:00:00 8827 CLOSED
5 2013-07-25 00:00:00 11318 COMPLETE
6 2013-07-25 00:00:00 7130 COMPLETE
7 2013-07-25 00:00:00 4530 COMPLETE
8 2013-07-25 00:00:00 2911 PROCESSING
9 2013-07-25 00:00:00 5657 PENDING_PAYMENT
10 2013-07-25 00:00:00 5648 PENDING_PAYMENT
In [4]:
%sql SELECT count(1) FROM orders
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.
Out[4]:
count
68883
In [5]:
%sql SELECT * FROM order_items LIMIT 10
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.
Out[5]:
order_item_id order_item_order_id order_item_product_id order_item_quantity order_item_subtotal order_item_product_price
1 1 957 1 299.98 299.98
2 2 1073 1 199.99 199.99
3 2 502 5 250.0 50.0
4 2 403 1 129.99 129.99
5 4 897 2 49.98 24.99
6 4 365 5 299.95 59.99
7 4 502 3 150.0 50.0
8 4 1014 4 199.92 49.98
9 5 957 1 299.98 299.98
10 5 365 5 299.95 59.99
In [6]:
%sql SELECT count(1) FROM order_items
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.
Out[6]:
count
172198
In [7]:
%sql SELECT * FROM products LIMIT 10
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.
Out[7]:
product_id product_category_id product_name product_description product_price product_image
1 2 Quest Q64 10 FT. x 10 FT. Slant Leg Instant U 59.98 http://images.acmesports.sports/Quest+Q64+10+FT.+x+10+FT.+Slant+Leg+Instant+Up+Canopy
2 2 Under Armour Men’s Highlight MC Football Clea 129.99 http://images.acmesports.sports/Under+Armour+Men%27s+Highlight+MC+Football+Cleat
3 2 Under Armour Men’s Renegade D Mid Football Cl 89.99 http://images.acmesports.sports/Under+Armour+Men%27s+Renegade+D+Mid+Football+Cleat
4 2 Under Armour Men’s Renegade D Mid Football Cl 89.99 http://images.acmesports.sports/Under+Armour+Men%27s+Renegade+D+Mid+Football+Cleat
5 2 Riddell Youth Revolution Speed Custom Footbal 199.99 http://images.acmesports.sports/Riddell+Youth+Revolution+Speed+Custom+Football+Helmet
6 2 Jordan Men’s VI Retro TD Football Cleat 134.99 http://images.acmesports.sports/Jordan+Men%27s+VI+Retro+TD+Football+Cleat
7 2 Schutt Youth Recruit Hybrid Custom Football H 99.99 http://images.acmesports.sports/Schutt+Youth+Recruit+Hybrid+Custom+Football+Helmet+2014
8 2 Nike Men’s Vapor Carbon Elite TD Football Cle 129.99 http://images.acmesports.sports/Nike+Men%27s+Vapor+Carbon+Elite+TD+Football+Cleat
9 2 Nike Adult Vapor Jet 3.0 Receiver Gloves 50.0 http://images.acmesports.sports/Nike+Adult+Vapor+Jet+3.0+Receiver+Gloves
10 2 Under Armour Men’s Highlight MC Football Clea 129.99 http://images.acmesports.sports/Under+Armour+Men%27s+Highlight+MC+Football+Cleat
In [8]:
%sql SELECT count(1) FROM products
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.
Out[8]:
count
1345

Exercise 1

Get all the employees who is making more than average salary with in each department.

  • Use HR database employees and department tables for this problem.
  • Compute average salary expense for each department and get those employee details who are making more salary than average salary.
  • Make sure average salary expense per department is rounded off to 2 decimals.
  • Output should contain employee_id, department_name, salary and avg_salary_expense (derived field).
  • Data should be sorted in ascending order by department_id and descending order by salary.
employee_id department_name salary avg_salary_expense
201 Marketing 13000.00 9500.00
114 Purchasing 11000.00 4150.00
121 Shipping 8200.00 3475.56
120 Shipping 8000.00 3475.56
122 Shipping 7900.00 3475.56
123 Shipping 6500.00 3475.56
124 Shipping 5800.00 3475.56
184 Shipping 4200.00 3475.56
185 Shipping 4100.00 3475.56
192 Shipping 4000.00 3475.56
193 Shipping 3900.00 3475.56
188 Shipping 3800.00 3475.56
137 Shipping 3600.00 3475.56
189 Shipping 3600.00 3475.56
141 Shipping 3500.00 3475.56
103 IT 9000.00 5760.00
104 IT 6000.00 5760.00
145 Sales 14000.00 8955.88
146 Sales 13500.00 8955.88
147 Sales 12000.00 8955.88
168 Sales 11500.00 8955.88
148 Sales 11000.00 8955.88
174 Sales 11000.00 8955.88
149 Sales 10500.00 8955.88
162 Sales 10500.00 8955.88
156 Sales 10000.00 8955.88
150 Sales 10000.00 8955.88
169 Sales 10000.00 8955.88
170 Sales 9600.00 8955.88
163 Sales 9500.00 8955.88
151 Sales 9500.00 8955.88
157 Sales 9500.00 8955.88
158 Sales 9000.00 8955.88
152 Sales 9000.00 8955.88
100 Executive 24000.00 19333.33
108 Finance 12000.00 8600.00
109 Finance 9000.00 8600.00
205 Accounting 12000.00 10150.00

Solution

In [1]:
%load_ext sql
In [2]:
%env DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db
env: DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db
In [13]:
%%sql

SELECT e.employee_id,
    e.department_name,
    e.salary,
    e.avg_salary_expense
FROM (
    SELECT e.employee_id,
        e.department_id,
        d.department_name, 
        e.salary,
        round(avg(e.salary) OVER (PARTITION BY e.department_id), 2) AS avg_salary_expense
    FROM employees AS e
        JOIN departments AS d
            ON e.department_id = d.department_id
) AS e
WHERE e.salary > e.avg_salary_expense
ORDER BY e.department_id, 
    e.salary DESC
 * postgresql://itversity_hr_user:***@localhost:5432/itversity_hr_db
38 rows affected.
Out[13]:
employee_id department_name salary avg_salary_expense
201 Marketing 13000.00 9500.00
114 Purchasing 11000.00 4150.00
121 Shipping 8200.00 3475.56
120 Shipping 8000.00 3475.56
122 Shipping 7900.00 3475.56
123 Shipping 6500.00 3475.56
124 Shipping 5800.00 3475.56
184 Shipping 4200.00 3475.56
185 Shipping 4100.00 3475.56
192 Shipping 4000.00 3475.56
193 Shipping 3900.00 3475.56
188 Shipping 3800.00 3475.56
189 Shipping 3600.00 3475.56
137 Shipping 3600.00 3475.56
141 Shipping 3500.00 3475.56
103 IT 9000.00 5760.00
104 IT 6000.00 5760.00
145 Sales 14000.00 8955.88
146 Sales 13500.00 8955.88
147 Sales 12000.00 8955.88
168 Sales 11500.00 8955.88
174 Sales 11000.00 8955.88
148 Sales 11000.00 8955.88
149 Sales 10500.00 8955.88
162 Sales 10500.00 8955.88
156 Sales 10000.00 8955.88
150 Sales 10000.00 8955.88
169 Sales 10000.00 8955.88
170 Sales 9600.00 8955.88
163 Sales 9500.00 8955.88
157 Sales 9500.00 8955.88
151 Sales 9500.00 8955.88
158 Sales 9000.00 8955.88
152 Sales 9000.00 8955.88
100 Executive 24000.00 19333.33
108 Finance 12000.00 8600.00
109 Finance 9000.00 8600.00
205 Accounting 12000.00 10150.00

Exercise 2

Get cumulative salary with in each department for Finance and IT department along with department name.

  • Use HR database employees and department tables for this problem.
  • Compute cumulative salary expense for Finance as well as IT departments with in respective departments.
  • Make sure cumulative salary expense per department is rounded off to 2 decimals.
  • Output should contain employee_id, department_name, salary and cum_salary_expense (derived field).
  • Data should be sorted in ascending order by department_name and then salary.
employee_id department_name salary cum_salary_expense
113 Finance 6900.00 6900.00
111 Finance 7700.00 14600.00
112 Finance 7800.00 22400.00
110 Finance 8200.00 30600.00
109 Finance 9000.00 39600.00
108 Finance 12000.00 51600.00
107 IT 4200.00 4200.00
106 IT 4800.00 9000.00
105 IT 4800.00 13800.00
104 IT 6000.00 19800.00
103 IT 9000.00 28800.00
In [8]:
%%sql

SELECT e.employee_id,
    d.department_name, 
    e.salary,
    round(sum(e.salary) OVER (
        PARTITION BY e.department_id
        ORDER BY salary
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ), 2) AS cum_salary_expense
FROM employees AS e
    JOIN departments AS d
        ON e.department_id = d.department_id
    WHERE d.department_name IN ('Finance', 'IT')
ORDER BY d.department_name,
    e.salarydz
 * postgresql://itversity_hr_user:***@localhost:5432/itversity_hr_db
11 rows affected.
Out[8]:
employee_id department_name salary cum_salary_expense
113 Finance 6900.00 6900.00
111 Finance 7700.00 14600.00
112 Finance 7800.00 22400.00
110 Finance 8200.00 30600.00
109 Finance 9000.00 39600.00
108 Finance 12000.00 51600.00
107 IT 4200.00 4200.00
106 IT 4800.00 13800.00
105 IT 4800.00 9000.00
104 IT 6000.00 19800.00
103 IT 9000.00 28800.00

Exercise 3

Get top 3 paid employees with in each department by salary (use dense_rank)

  • Use HR database employees and department tables for this problem.
  • Highest paid employee should be ranked first.
  • Output should contain employee_id, department_id, department_name, salary and employee_rank (derived field).
  • Data should be sorted in ascending order by department_id in ascending order and then salary in descending order.
employee_id department_id department_name salary employee_rank
200 10 Administration 4400.00 1
201 20 Marketing 13000.00 1
202 20 Marketing 6000.00 2
114 30 Purchasing 11000.00 1
115 30 Purchasing 3100.00 2
116 30 Purchasing 2900.00 3
203 40 Human Resources 6500.00 1
121 50 Shipping 8200.00 1
120 50 Shipping 8000.00 2
122 50 Shipping 7900.00 3
103 60 IT 9000.00 1
104 60 IT 6000.00 2
105 60 IT 4800.00 3
106 60 IT 4800.00 3
204 70 Public Relations 10000.00 1
145 80 Sales 14000.00 1
146 80 Sales 13500.00 2
147 80 Sales 12000.00 3
100 90 Executive 24000.00 1
101 90 Executive 17000.00 2
102 90 Executive 17000.00 2
108 100 Finance 12000.00 1
109 100 Finance 9000.00 2
110 100 Finance 8200.00 3
205 110 Accounting 12000.00 1
206 110 Accounting 8300.00 2
In [12]:
%%sql

SELECT *
FROM (
    SELECT e.employee_id,
        e.department_id,
        d.department_name, 
        e.salary,
        dense_rank() OVER (
            PARTITION BY e.department_id
            ORDER BY e.salary DESC
        ) AS employee_rank
    FROM employees AS e
        JOIN departments AS d
            ON e.department_id = d.department_id
) AS e
WHERE e.employee_rank <= 3
ORDER BY e.department_id, 
    e.salary DESC
 * postgresql://itversity_hr_user:***@localhost:5432/itversity_hr_db
26 rows affected.
Out[12]:
employee_id department_id department_name salary employee_rank
200 10 Administration 4400.00 1
201 20 Marketing 13000.00 1
202 20 Marketing 6000.00 2
114 30 Purchasing 11000.00 1
115 30 Purchasing 3100.00 2
116 30 Purchasing 2900.00 3
203 40 Human Resources 6500.00 1
121 50 Shipping 8200.00 1
120 50 Shipping 8000.00 2
122 50 Shipping 7900.00 3
103 60 IT 9000.00 1
104 60 IT 6000.00 2
105 60 IT 4800.00 3
106 60 IT 4800.00 3
204 70 Public Relations 10000.00 1
145 80 Sales 14000.00 1
146 80 Sales 13500.00 2
147 80 Sales 12000.00 3
100 90 Executive 24000.00 1
101 90 Executive 17000.00 2
102 90 Executive 17000.00 2
108 100 Finance 12000.00 1
109 100 Finance 9000.00 2
110 100 Finance 8200.00 3
205 110 Accounting 12000.00 1
206 110 Accounting 8300.00 2

Exercise 4

Get top 3 products sold in the month of 2014 January by revenue.

  • Use retail database tables such as orders, order_items and products.
  • Highest revenue generating product should come at top.
  • Consider only those orders which are either in COMPLETE or CLOSED status.
  • Output should contain product_id, product_name, revenue, product_rank. revenue and product_rank are derived fields.
  • Data should be sorted in descending order by revenue.
product_id product_name revenue product_rank
1004 Field & Stream Sportsman 16 Gun Fire Safe 250787.46 1
365 Perfect Fitness Perfect Rip Deck 151474.75 2
957 Diamondback Women’s Serene Classic Comfort Bi 148190.12 3
In [1]:
%load_ext sql
In [2]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
In [9]:
%%sql

WITH product_revenue AS (
    SELECT p.product_id,
        p.product_name,
        round(sum(oi.order_item_subtotal)::numeric, 2) AS revenue
    FROM orders AS o
        JOIN order_items oi
            ON o.order_id = oi.order_item_order_id
        JOIN products AS p
            ON p.product_id = oi.order_item_product_id
    WHERE to_char(o.order_date, 'yyyy-MM') = '2014-01'
        AND o.order_status IN ('COMPLETE', 'CLOSED')
    GROUP BY p.product_id,
        p.product_name
) SELECT * FROM (
    SELECT pr.*,
        dense_rank() OVER (
            ORDER BY revenue DESC
        ) AS product_rank
    FROM product_revenue AS pr
) AS q
WHERE product_rank <= 3
ORDER BY revenue DESC
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
3 rows affected.
Out[9]:
product_id product_name revenue product_rank
1004 Field & Stream Sportsman 16 Gun Fire Safe 250787.46 1
365 Perfect Fitness Perfect Rip Deck 151474.75 2
957 Diamondback Women’s Serene Classic Comfort Bi 148190.12 3

Exercise 5

Get top 3 products sold in the month of 2014 January under selected categories by revenue. The categories are Cardio Equipment and Strength Training.

  • Use retail database tables such as orders, order_items, products as well as categories.
  • Highest revenue generating product should come at top.
  • Consider only those orders which are either in COMPLETE or CLOSED status.
  • Output should contain category_id, category_name, product_id, product_name, revenue, product_rank. revenue and product_rank are derived fields.
  • Data should be sorted in ascending order by category_id and descending order by revenue.
category_id category_name product_id product_name revenue product_rank
9 Cardio Equipment 191 Nike Men’s Free 5.0+ Running Shoe 132286.77 1
9 Cardio Equipment 172 Nike Women’s Tempo Shorts 870.00 2
10 Strength Training 208 SOLE E35 Elliptical 1999.99 1
10 Strength Training 203 GoPro HERO3+ Black Edition Camera 1199.97 2
10 Strength Training 216 Yakima DoubleDown Ace Hitch Mount 4-Bike Rack 189.00 3
In [12]:
%%sql

WITH product_revenue AS (
    SELECT c.category_id,
        c.category_name,
        p.product_id,
        p.product_name,
        round(sum(oi.order_item_subtotal)::numeric, 2) AS revenue
    FROM orders AS o
        JOIN order_items oi
            ON o.order_id = oi.order_item_order_id
        JOIN products AS p
            ON p.product_id = oi.order_item_product_id
        JOIN categories AS c
            ON c.category_id = p.product_category_id
    WHERE to_char(o.order_date, 'yyyy-MM') = '2014-01'
        AND o.order_status IN ('COMPLETE', 'CLOSED')
        AND c.category_name IN ('Cardio Equipment', 'Strength Training')
    GROUP BY c.category_id,
        c.category_name,
        p.product_id,
        p.product_name
) SELECT * FROM (
    SELECT pr.*,
        dense_rank() OVER (
            PARTITION BY category_id
            ORDER BY revenue DESC
        ) AS product_rank
    FROM product_revenue AS pr
) AS q
WHERE product_rank <= 3
ORDER BY category_id,
    revenue DESC
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
5 rows affected.
Out[12]:
category_id category_name product_id product_name revenue product_rank
9 Cardio Equipment 191 Nike Men’s Free 5.0+ Running Shoe 132286.77 1
9 Cardio Equipment 172 Nike Women’s Tempo Shorts 870.00 2
10 Strength Training 208 SOLE E35 Elliptical 1999.99 1
10 Strength Training 203 GoPro HERO3+ Black Edition Camera 1199.97 2
10 Strength Training 216 Yakima DoubleDown Ace Hitch Mount 4-Bike Rack 189.00 3

Share this post