Overview of Analytic Functions

Topic
Materials

Let us get an overview of Analytics or Windowing Functions as part of SQL.

  • Aggregate Functions (summinmaxavg)

  • Window Functions (leadlagfirst_valuelast_value)

  • Rank Functions (rankdense_rankrow_number etc)

  • For all the functions when used as part of Analytic or Windowing functions we use OVER clause.

  • For aggregate functions we typically use PARTITION BY

  • For global ranking and windowing functions we can use ORDER BY sort_column and for ranking and windowing with in a partition or group we can use PARTITION BY partition_column ORDER BY sort_column.

  • Here is how the syntax will look like.

    • Aggregate - func() OVER (PARTITION BY partition_column)

    • Global Rank - func() OVER (ORDER BY sort_column DESC)

    • Rank in a partition - func() OVER (PARTITION BY partition_column ORDER BY sort_column DESC)

  • We can also get cumulative or moving metrics by adding ROWS BETWEEN clause. We will see details later.

Prepare Tables

Let us create couple of tables which will be used for the demonstrations of Windowing and Ranking functions.

  • We have ORDERS and ORDER_ITEMS tables in our retail database.

  • Let us take care of computing daily revenue as well as daily product revenue.

  • As we will be using same data set several times, let us create the tables to pre compute the data.

  • daily_revenue will have the order_date and revenue, where data is aggregated using order_date as partition key.

  • daily_product_revenue will have order_dateorder_item_product_id and revenue. In this case data is aggregated using order_date and order_item_product_id as partition keys.

Note

Let us create table using CTAS to save daily revenue.

%load_ext sql

Copy to clipboard

The sql extension is already loaded. To reload it, use:
  %reload_ext sql

Copy to clipboard

%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

Copy to clipboard

env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

Copy to clipboard

%%sql

DROP TABLE IF EXISTS daily_revenue

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
Done.

Copy to clipboard

[]

Copy to clipboard

%%sql

CREATE TABLE daily_revenue
AS
SELECT o.order_date,
    round(sum(oi.order_item_subtotal)::numeric, 2) AS revenue
FROM orders o JOIN order_items oi
ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
GROUP BY o.order_date

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
364 rows affected.

Copy to clipboard

[]

Copy to clipboard

%%sql

SELECT * FROM daily_revenue
ORDER BY order_date
LIMIT 10

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.

Copy to clipboard

order_date revenue
2013-07-25 00:00:00 31547.23
2013-07-26 00:00:00 54713.23
2013-07-27 00:00:00 48411.48
2013-07-28 00:00:00 35672.03
2013-07-29 00:00:00 54579.70
2013-07-30 00:00:00 49329.29
2013-07-31 00:00:00 59212.49
2013-08-01 00:00:00 49160.08
2013-08-02 00:00:00 50688.58
2013-08-03 00:00:00 43416.74

Note

Let us create table using CTAS to save daily product revenue.

%%sql

DROP TABLE IF EXISTS daily_product_revenue

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
Done.

Copy to clipboard

[]

Copy to clipboard

%%sql

CREATE TABLE daily_product_revenue
AS
SELECT o.order_date,
    oi.order_item_product_id,
    round(sum(oi.order_item_subtotal)::numeric, 2) AS revenue
FROM orders o JOIN order_items oi
ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
GROUP BY o.order_date, oi.order_item_product_id

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
9120 rows affected.

Copy to clipboard

[]

Copy to clipboard

%%sql

SELECT * FROM daily_product_revenue
ORDER BY order_date, revenue DESC
LIMIT 10

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.

Copy to clipboard

order_date order_item_product_id revenue
2013-07-25 00:00:00 1004 5599.72
2013-07-25 00:00:00 191 5099.49
2013-07-25 00:00:00 957 4499.70
2013-07-25 00:00:00 365 3359.44
2013-07-25 00:00:00 1073 2999.85
2013-07-25 00:00:00 1014 2798.88
2013-07-25 00:00:00 403 1949.85
2013-07-25 00:00:00 502 1650.00
2013-07-25 00:00:00 627 1079.73
2013-07-25 00:00:00 226 599.99