Analytic Functions – Windowing

Topic
Materials

Let us go through the list of Windowing functions supported by Postgres.

  • lead and lag

  • first_value and last_value

  • We can either use ORDER BY sort_column or PARTITION BY partition_column ORDER BY sort_column while using Windowing Functions.

%load_ext sql

Copy to clipboard

The sql extension is already loaded. To reload it, use:
  %reload_ext sql

Copy to clipboard

%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

Copy to clipboard

env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

Copy to clipboard

Getting LEAD and LAG values

Let us understand LEAD and LAG functions to get column values from following or prior records.

Note

Here is the example to get values from either immediate prior or following record along with values from curent record. We will get values from prior or following record based on ORDER BY within OVER Clause.

%%sql

SELECT t.*,
    lead(order_date) OVER (ORDER BY order_date DESC) AS prior_date,
    lead(revenue) OVER (ORDER BY order_date DESC) AS prior_revenue,
    lag(order_date) OVER (ORDER BY order_date) AS lag_prior_date,
    lag(revenue) OVER (ORDER BY order_date) AS lag_prior_revenue
FROM daily_revenue AS t
ORDER BY order_date DESC
LIMIT 10

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.

Copy to clipboard

order_date revenue prior_date prior_revenue lag_prior_date lag_prior_revenue
2014-07-24 00:00:00 50885.19 2014-07-23 00:00:00 38795.23 2014-07-23 00:00:00 38795.23
2014-07-23 00:00:00 38795.23 2014-07-22 00:00:00 36717.24 2014-07-22 00:00:00 36717.24
2014-07-22 00:00:00 36717.24 2014-07-21 00:00:00 51427.70 2014-07-21 00:00:00 51427.70
2014-07-21 00:00:00 51427.70 2014-07-20 00:00:00 60047.45 2014-07-20 00:00:00 60047.45
2014-07-20 00:00:00 60047.45 2014-07-19 00:00:00 38420.99 2014-07-19 00:00:00 38420.99
2014-07-19 00:00:00 38420.99 2014-07-18 00:00:00 43856.60 2014-07-18 00:00:00 43856.60
2014-07-18 00:00:00 43856.60 2014-07-17 00:00:00 36384.77 2014-07-17 00:00:00 36384.77
2014-07-17 00:00:00 36384.77 2014-07-16 00:00:00 43011.92 2014-07-16 00:00:00 43011.92
2014-07-16 00:00:00 43011.92 2014-07-15 00:00:00 53480.23 2014-07-15 00:00:00 53480.23
2014-07-15 00:00:00 53480.23 2014-07-14 00:00:00 29937.52 2014-07-14 00:00:00 29937.52

Note

Here is the example to get values from either prior or following 7th record along with values from current record.

%%sql

SELECT t.*,
    lead(order_date, 7) OVER (ORDER BY order_date DESC) AS prior_date,
    lead(revenue, 7) OVER (ORDER BY order_date DESC) AS prior_revenue
FROM daily_revenue t
ORDER BY order_date DESC
LIMIT 10

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.

Copy to clipboard

order_date revenue prior_date prior_revenue
2014-07-24 00:00:00 50885.19 2014-07-17 00:00:00 36384.77
2014-07-23 00:00:00 38795.23 2014-07-16 00:00:00 43011.92
2014-07-22 00:00:00 36717.24 2014-07-15 00:00:00 53480.23
2014-07-21 00:00:00 51427.70 2014-07-14 00:00:00 29937.52
2014-07-20 00:00:00 60047.45 2014-07-13 00:00:00 40410.99
2014-07-19 00:00:00 38420.99 2014-07-12 00:00:00 38449.77
2014-07-18 00:00:00 43856.60 2014-07-11 00:00:00 29596.32
2014-07-17 00:00:00 36384.77 2014-07-10 00:00:00 47826.02
2014-07-16 00:00:00 43011.92 2014-07-09 00:00:00 36929.91
2014-07-15 00:00:00 53480.23 2014-07-08 00:00:00 50434.81

Note

For values related to non existing prior or following record, we will get nulls.

%%sql

SELECT t.*,
    lead(order_date, 7) OVER (ORDER BY order_date DESC) AS prior_date,
    lead(revenue, 7) OVER (ORDER BY order_date DESC) AS prior_revenue
FROM daily_revenue t
ORDER BY order_date
LIMIT 10

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.

Copy to clipboard

order_date revenue prior_date prior_revenue
2013-07-25 00:00:00 31547.23 None None
2013-07-26 00:00:00 54713.23 None None
2013-07-27 00:00:00 48411.48 None None
2013-07-28 00:00:00 35672.03 None None
2013-07-29 00:00:00 54579.70 None None
2013-07-30 00:00:00 49329.29 None None
2013-07-31 00:00:00 59212.49 None None
2013-08-01 00:00:00 49160.08 2013-07-25 00:00:00 31547.23
2013-08-02 00:00:00 50688.58 2013-07-26 00:00:00 54713.23
2013-08-03 00:00:00 43416.74 2013-07-27 00:00:00 48411.48

Note

We can replace nulls by passing relevant values as 3rd argument. However, the data type of the values should be compatible with the columns on which lead or lag is applied.

%%sql

SELECT t.*,
    lead(order_date, 7) OVER (ORDER BY order_date DESC) AS prior_date,
    lead(revenue, 7, 0.0) OVER (ORDER BY order_date DESC) AS prior_revenue
FROM daily_revenue t
ORDER BY order_date
LIMIT 10

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.

Copy to clipboard

order_date revenue prior_date prior_revenue
2013-07-25 00:00:00 31547.23 None 0.0
2013-07-26 00:00:00 54713.23 None 0.0
2013-07-27 00:00:00 48411.48 None 0.0
2013-07-28 00:00:00 35672.03 None 0.0
2013-07-29 00:00:00 54579.70 None 0.0
2013-07-30 00:00:00 49329.29 None 0.0
2013-07-31 00:00:00 59212.49 None 0.0
2013-08-01 00:00:00 49160.08 2013-07-25 00:00:00 31547.23
2013-08-02 00:00:00 50688.58 2013-07-26 00:00:00 54713.23
2013-08-03 00:00:00 43416.74 2013-07-27 00:00:00 48411.48
%%sql

SELECT * FROM daily_product_revenue 
ORDER BY order_date, revenue DESC
LIMIT 10

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.

Copy to clipboard

order_date order_item_product_id revenue
2013-07-25 00:00:00 1004 5599.72
2013-07-25 00:00:00 191 5099.49
2013-07-25 00:00:00 957 4499.70
2013-07-25 00:00:00 365 3359.44
2013-07-25 00:00:00 1073 2999.85
2013-07-25 00:00:00 1014 2798.88
2013-07-25 00:00:00 403 1949.85
2013-07-25 00:00:00 502 1650.00
2013-07-25 00:00:00 627 1079.73
2013-07-25 00:00:00 226 599.99
%%sql

SELECT t.*,
    LEAD(order_item_product_id) OVER (
        PARTITION BY order_date 
        ORDER BY revenue DESC
    ) next_product_id,
    LEAD(revenue) OVER (
        PARTITION BY order_date 
        ORDER BY revenue DESC
    ) next_revenue
FROM daily_product_revenue t
ORDER BY order_date, revenue DESC
LIMIT 30

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
30 rows affected.

Copy to clipboard

order_date order_item_product_id revenue next_product_id next_revenue
2013-07-25 00:00:00 1004 5599.72 191 5099.49
2013-07-25 00:00:00 191 5099.49 957 4499.70
2013-07-25 00:00:00 957 4499.70 365 3359.44
2013-07-25 00:00:00 365 3359.44 1073 2999.85
2013-07-25 00:00:00 1073 2999.85 1014 2798.88
2013-07-25 00:00:00 1014 2798.88 403 1949.85
2013-07-25 00:00:00 403 1949.85 502 1650.00
2013-07-25 00:00:00 502 1650.00 627 1079.73
2013-07-25 00:00:00 627 1079.73 226 599.99
2013-07-25 00:00:00 226 599.99 24 319.96
2013-07-25 00:00:00 24 319.96 821 207.96
2013-07-25 00:00:00 821 207.96 625 199.99
2013-07-25 00:00:00 625 199.99 705 119.99
2013-07-25 00:00:00 705 119.99 572 119.97
2013-07-25 00:00:00 572 119.97 666 109.99
2013-07-25 00:00:00 666 109.99 725 108.00
2013-07-25 00:00:00 725 108.00 134 100.00
2013-07-25 00:00:00 134 100.00 906 99.96
2013-07-25 00:00:00 906 99.96 828 95.97
2013-07-25 00:00:00 828 95.97 810 79.96
2013-07-25 00:00:00 810 79.96 924 79.95
2013-07-25 00:00:00 924 79.95 926 79.95
2013-07-25 00:00:00 926 79.95 93 74.97
2013-07-25 00:00:00 93 74.97 835 63.98
2013-07-25 00:00:00 835 63.98 897 49.98
2013-07-25 00:00:00 897 49.98 None None
2013-07-26 00:00:00 1004 10799.46 365 7978.67
2013-07-26 00:00:00 365 7978.67 957 6899.54
2013-07-26 00:00:00 957 6899.54 191 6799.32
2013-07-26 00:00:00 191 6799.32 1014 4798.08

Getting first and last values

Let us see how we can get first and last value based on the criteria. min or max can be used to get only the min or max of the metric we are interested in, however we cannot get other attributes of those records.

Here is the example of using first_value.

%%sql

SELECT t.*,
    first_value(order_item_product_id) OVER (
        PARTITION BY order_date ORDER BY revenue DESC
    ) first_product_id,
    first_value(revenue) OVER (
        PARTITION BY order_date ORDER BY revenue DESC
    ) first_revenue,
    max(revenue) OVER (
        PARTITION BY order_date
    ) max_revenue
FROM daily_product_revenue t
ORDER BY order_date, revenue DESC
LIMIT 10

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.

Copy to clipboard

order_date order_item_product_id revenue first_product_id first_revenue max_revenue
2013-07-25 00:00:00 1004 5599.72 1004 5599.72 5599.72
2013-07-25 00:00:00 191 5099.49 1004 5599.72 5599.72
2013-07-25 00:00:00 957 4499.70 1004 5599.72 5599.72
2013-07-25 00:00:00 365 3359.44 1004 5599.72 5599.72
2013-07-25 00:00:00 1073 2999.85 1004 5599.72 5599.72
2013-07-25 00:00:00 1014 2798.88 1004 5599.72 5599.72
2013-07-25 00:00:00 403 1949.85 1004 5599.72 5599.72
2013-07-25 00:00:00 502 1650.00 1004 5599.72 5599.72
2013-07-25 00:00:00 627 1079.73 1004 5599.72 5599.72
2013-07-25 00:00:00 226 599.99 1004 5599.72 5599.72

Let us see an example with last_value. While using last_value we need to specify ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING.

  • By default it uses ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

  • The last value with in UNBOUNDED PRECEDING AND CURRENT ROW will be current record.

  • To get the right value, we have to change the windowing clause to ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING.

%%sql

SELECT t.*,
    last_value(order_item_product_id) OVER (
        PARTITION BY order_date ORDER BY revenue    
        ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    ) last_product_id,
    max(revenue) OVER (
        PARTITION BY order_date
    ) last_revenue
FROM daily_product_revenue AS t
ORDER BY order_date, revenue DESC
LIMIT 30

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
30 rows affected.

Copy to clipboard

order_date order_item_product_id revenue last_product_id last_revenue
2013-07-25 00:00:00 1004 5599.72 1004 5599.72
2013-07-25 00:00:00 191 5099.49 1004 5599.72
2013-07-25 00:00:00 957 4499.70 1004 5599.72
2013-07-25 00:00:00 365 3359.44 1004 5599.72
2013-07-25 00:00:00 1073 2999.85 1004 5599.72
2013-07-25 00:00:00 1014 2798.88 1004 5599.72
2013-07-25 00:00:00 403 1949.85 1004 5599.72
2013-07-25 00:00:00 502 1650.00 1004 5599.72
2013-07-25 00:00:00 627 1079.73 1004 5599.72
2013-07-25 00:00:00 226 599.99 1004 5599.72
2013-07-25 00:00:00 24 319.96 1004 5599.72
2013-07-25 00:00:00 821 207.96 1004 5599.72
2013-07-25 00:00:00 625 199.99 1004 5599.72
2013-07-25 00:00:00 705 119.99 1004 5599.72
2013-07-25 00:00:00 572 119.97 1004 5599.72
2013-07-25 00:00:00 666 109.99 1004 5599.72
2013-07-25 00:00:00 725 108.00 1004 5599.72
2013-07-25 00:00:00 134 100.00 1004 5599.72
2013-07-25 00:00:00 906 99.96 1004 5599.72
2013-07-25 00:00:00 828 95.97 1004 5599.72
2013-07-25 00:00:00 810 79.96 1004 5599.72
2013-07-25 00:00:00 924 79.95 1004 5599.72
2013-07-25 00:00:00 926 79.95 1004 5599.72
2013-07-25 00:00:00 93 74.97 1004 5599.72
2013-07-25 00:00:00 835 63.98 1004 5599.72
2013-07-25 00:00:00 897 49.98 1004 5599.72
2013-07-26 00:00:00 1004 10799.46 1004 10799.46
2013-07-26 00:00:00 365 7978.67 1004 10799.46
2013-07-26 00:00:00 957 6899.54 1004 10799.46
2013-07-26 00:00:00 191 6799.32 1004 10799.46