Cumulative or Moving Aggregations

Let us understand how we can take care of cumulative or moving aggregations using Analytic Functions.

  • When it comes to Windowing or Analytic Functions we can also specify window spec using ROWS BETWEEN clause.

  • Even when we do not specify window spec, the default window spec is used. For most of the functions the default window spec is UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. You also have special clauses such as CURRENT ROW.

  • Here are some of the examples with respect to ROWS BETWEEN.

    • ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

    • ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

    • ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING

    • ROWS BETWEEN 3 PRECEDING AND CURRENT ROW - moving aggregations using current record and previous 3 records.

    • ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING - moving aggregations using current record and following 3 records.

    • ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING - moving aggregations based up on 7 records (current record, 3 previous records and 3 following records)

  • We can leverage ROWS BETWEEN for cumulative aggregations or moving aggregations.

  • Here is an example of cumulative sum.

Warning

If you are using Jupyter based environment make sure to restart the kernel, as the session might have been already connected with retail database.

%load_ext sql

Copy to clipboard

%env DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db

Copy to clipboard

env: DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db

Copy to clipboard

Note

Even though it is not mandatory to specify ORDER BY as per syntax for cumulative aggregations, it is a must to specify. If not, you will end up getting incorrect results.

%%sql

SELECT e.employee_id, e.department_id, e.salary,
    sum(e.salary) OVER (
        PARTITION BY e.department_id
        ORDER BY e.salary
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS sum_sal_expense
FROM employees e
ORDER BY e.department_id, e.salary DESC
LIMIT 10

Copy to clipboard

10 rows affected.

Copy to clipboard

employee_id department_id salary sum_sal_expense
200 10 4400.00 4400.00
201 20 13000.00 19000.00
202 20 6000.00 6000.00
114 30 11000.00 24900.00
115 30 3100.00 13900.00
116 30 2900.00 10800.00
117 30 2800.00 7900.00
118 30 2600.00 5100.00
119 30 2500.00 2500.00
203 40 6500.00 6500.00

Warning

If you are using Jupyter based environment make sure to restart the kernel, as the session might have been already connected with hr database.

%load_ext sql

Copy to clipboard

%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

Copy to clipboard

env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

Copy to clipboard

Note

Here is the example for cumulative sum for every month using daily_product_revenue in retail database.

%%sql

SELECT t.*,
    round(sum(t.revenue) OVER (
        PARTITION BY to_char(order_date, 'yyyy-MM')
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ), 2) AS cumulative_daily_revenue
FROM daily_revenue t
ORDER BY to_char(order_date, 'yyyy-MM'),
    order_date
LIMIT 10

Copy to clipboard

10 rows affected.

Copy to clipboard

order_date revenue cumulative_daily_revenue
2013-07-25 00:00:00 31547.23 31547.23
2013-07-26 00:00:00 54713.23 86260.46
2013-07-27 00:00:00 48411.48 134671.94
2013-07-28 00:00:00 35672.03 170343.97
2013-07-29 00:00:00 54579.70 224923.67
2013-07-30 00:00:00 49329.29 274252.96
2013-07-31 00:00:00 59212.49 333465.45
2013-08-01 00:00:00 49160.08 49160.08
2013-08-02 00:00:00 50688.58 99848.66
2013-08-03 00:00:00 43416.74 143265.40

Note

Here are examples for 3 day moving sum as well as average using daily_revenue in retail database.

%%sql

SELECT t.*,
    round(sum(t.revenue) OVER (
        ORDER BY order_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ), 2) AS moving_3day_revenue
FROM daily_revenue t
ORDER BY order_date
LIMIT 20

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
20 rows affected.

Copy to clipboard

order_date revenue moving_3day_revenue
2013-07-25 00:00:00 31547.23 31547.23
2013-07-26 00:00:00 54713.23 86260.46
2013-07-27 00:00:00 48411.48 134671.94
2013-07-28 00:00:00 35672.03 138796.74
2013-07-29 00:00:00 54579.70 138663.21
2013-07-30 00:00:00 49329.29 139581.02
2013-07-31 00:00:00 59212.49 163121.48
2013-08-01 00:00:00 49160.08 157701.86
2013-08-02 00:00:00 50688.58 159061.15
2013-08-03 00:00:00 43416.74 143265.40
2013-08-04 00:00:00 35093.01 129198.33
2013-08-05 00:00:00 34025.27 112535.02
2013-08-06 00:00:00 57843.89 126962.17
2013-08-07 00:00:00 45525.59 137394.75
2013-08-08 00:00:00 33549.47 136918.95
2013-08-09 00:00:00 29225.16 108300.22
2013-08-10 00:00:00 46435.04 109209.67
2013-08-11 00:00:00 31155.50 106815.70
2013-08-12 00:00:00 59014.74 136605.28
2013-08-13 00:00:00 17956.88 108127.12
%%sql

SELECT t.*,
    round(sum(t.revenue) OVER (
        ORDER BY order_date
        ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
    ), 2) AS moving_3day_revenue
FROM daily_revenue t
ORDER BY order_date
LIMIT 20

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
20 rows affected.

Copy to clipboard

order_date revenue moving_3day_revenue
2013-07-25 00:00:00 31547.23 134671.94
2013-07-26 00:00:00 54713.23 170343.97
2013-07-27 00:00:00 48411.48 224923.67
2013-07-28 00:00:00 35672.03 242705.73
2013-07-29 00:00:00 54579.70 247204.99
2013-07-30 00:00:00 49329.29 247953.59
2013-07-31 00:00:00 59212.49 262970.14
2013-08-01 00:00:00 49160.08 251807.18
2013-08-02 00:00:00 50688.58 237570.90
2013-08-03 00:00:00 43416.74 212383.68
2013-08-04 00:00:00 35093.01 221067.49
2013-08-05 00:00:00 34025.27 215904.50
2013-08-06 00:00:00 57843.89 206037.23
2013-08-07 00:00:00 45525.59 200169.38
2013-08-08 00:00:00 33549.47 212579.15
2013-08-09 00:00:00 29225.16 185890.76
2013-08-10 00:00:00 46435.04 199379.91
2013-08-11 00:00:00 31155.50 183787.32
2013-08-12 00:00:00 59014.74 196605.61
2013-08-13 00:00:00 17956.88 199737.25
%%sql

SELECT t.*,
    round(avg(t.revenue) OVER (
        ORDER BY order_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ), 2) AS moving_3day_revenue
FROM daily_revenue t
ORDER BY order_date
LIMIT 20

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
20 rows affected.

Copy to clipboard

order_date revenue moving_3day_revenue
2013-07-25 00:00:00 31547.23 31547.23
2013-07-26 00:00:00 54713.23 43130.23
2013-07-27 00:00:00 48411.48 44890.65
2013-07-28 00:00:00 35672.03 46265.58
2013-07-29 00:00:00 54579.70 46221.07
2013-07-30 00:00:00 49329.29 46527.01
2013-07-31 00:00:00 59212.49 54373.83
2013-08-01 00:00:00 49160.08 52567.29
2013-08-02 00:00:00 50688.58 53020.38
2013-08-03 00:00:00 43416.74 47755.13
2013-08-04 00:00:00 35093.01 43066.11
2013-08-05 00:00:00 34025.27 37511.67
2013-08-06 00:00:00 57843.89 42320.72
2013-08-07 00:00:00 45525.59 45798.25
2013-08-08 00:00:00 33549.47 45639.65
2013-08-09 00:00:00 29225.16 36100.07
2013-08-10 00:00:00 46435.04 36403.22
2013-08-11 00:00:00 31155.50 35605.23
2013-08-12 00:00:00 59014.74 45535.09
2013-08-13 00:00:00 17956.88 36042.37