Merging or Upserting Data

At times we need to merge or upsert the data (update existing records and insert new records)

  • One of the way to achieve merge or upsert is to develop 2 statements - one to update and other to insert.

  • The queries in both the statements (update and insert) should return mutually exclusive results.

  • Even though the statements can be executed in any order, updating first and then inserting perform better in most of the cases (as update have to deal with lesser number of records with this approach)

  • We can also take care of merge or upsert using INSERT with ON CONFLICT (columns) DO UPDATE.

  • Postgres does not have either MERGE or UPSERT as part of the SQL syntax.

%load_ext sql

Copy to clipboard

The sql extension is already loaded. To reload it, use:
  %reload_ext sql

Copy to clipboard

%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

Copy to clipboard

env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

Copy to clipboard

%sql DROP TABLE IF EXISTS customer_order_metrics_dly

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
Done.

Copy to clipboard

[]

Copy to clipboard

%%sql

CREATE TABLE customer_order_metrics_dly (
    customer_id INT,
    order_date DATE,
    order_count INT,
    order_revenue FLOAT
)

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
Done.

Copy to clipboard

[]

Copy to clipboard

%%sql

ALTER TABLE customer_order_metrics_dly
    ADD PRIMARY KEY (customer_id, order_date)

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
Done.

Copy to clipboard

[]

Copy to clipboard

Note

Let us go through the 2 statement approach. Here we are inserting data for the month of August 2013.

%%sql

INSERT INTO customer_order_metrics_dly
SELECT o.order_customer_id,
    o.order_date,
    count(1) order_count,
    NULL
FROM orders o 
    JOIN order_items oi
        ON o.order_id = oi.order_item_order_id
WHERE o.order_date BETWEEN '2013-08-01' AND '2013-08-31'
GROUP BY o.order_customer_id,
    o.order_date

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
4708 rows affected.

Copy to clipboard

[]

Copy to clipboard

Note

Now we want to merge data into the table using 2013 August to 2013 October. As we are using 2 statement approach, first we should update and then we should insert

%%sql

UPDATE customer_order_metrics_dly comd
SET 
    (order_count, order_revenue) = (
        SELECT count(1),
            round(sum(oi.order_item_subtotal)::numeric, 2)
        FROM orders o 
            JOIN order_items oi
                ON o.order_id = oi.order_item_order_id
        WHERE o.order_date BETWEEN '2013-08-01' AND '2013-10-31'
            AND o.order_customer_id = comd.customer_id
            AND o.order_date = comd.order_date
        GROUP BY o.order_customer_id,
            o.order_date
    )
WHERE comd.order_date BETWEEN '2013-08-01' AND '2013-10-31'

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
4708 rows affected.

Copy to clipboard

[]

Copy to clipboard

%%sql

SELECT * FROM customer_order_metrics_dly
ORDER BY order_date, customer_id
LIMIT 10

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.

Copy to clipboard

customer_id order_date order_count order_revenue
34 2013-08-01 4 789.92
109 2013-08-01 3 799.9
174 2013-08-01 5 654.89
267 2013-08-01 4 559.97
478 2013-08-01 5 729.9
553 2013-08-01 2 399.9
692 2013-08-01 2 479.92
696 2013-08-01 2 649.88
800 2013-08-01 5 609.95
835 2013-08-01 5 589.9
%%sql

SELECT to_char(order_date, 'yyyy-MM'), count(1) FROM customer_order_metrics_dly
GROUP BY to_char(order_date, 'yyyy-MM')
LIMIT 10

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.

Copy to clipboard

to_char count
2013-08 4708
%%sql

INSERT INTO customer_order_metrics_dly
SELECT o.order_customer_id AS customer_id,
    o.order_date,
    count(1) order_count,
    round(sum(order_item_subtotal)::numeric, 2)
FROM orders o 
    JOIN order_items oi
        ON o.order_id = oi.order_item_order_id
WHERE o.order_date BETWEEN '2013-08-01' AND '2013-10-31'
    AND NOT EXISTS (
        SELECT 1 FROM customer_order_metrics_dly codm
        WHERE o.order_customer_id = codm.customer_id
            AND o.order_date = codm.order_date
    )
GROUP BY o.order_customer_id,
    o.order_date

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
9265 rows affected.

Copy to clipboard

[]

Copy to clipboard

%%sql

SELECT * FROM customer_order_metrics_dly
WHERE order_date::varchar ~ '2013-09'
ORDER BY order_date, customer_id
LIMIT 10

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.

Copy to clipboard

customer_id order_date order_count order_revenue
19 2013-09-01 5 839.92
95 2013-09-01 5 969.85
136 2013-09-01 4 639.94
247 2013-09-01 2 639.94
383 2013-09-01 5 729.9
437 2013-09-01 4 829.97
543 2013-09-01 4 1489.83
601 2013-09-01 2 159.99
689 2013-09-01 2 419.96
842 2013-09-01 4 954.87
%%sql

SELECT to_char(order_date, 'yyyy-MM'), count(1) FROM customer_order_metrics_dly
GROUP BY to_char(order_date, 'yyyy-MM')
LIMIT 10

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
3 rows affected.

Copy to clipboard

to_char count
2013-08 4708
2013-10 4417
2013-09 4848

Note

Let us see how we can upsert or merge the data using INSERT with ON CONFLICT (columns) DO UPDATE. We will first insert data for the month of August 2013 and then upsert or merge for the months of August 2013 to October 2013.

%sql TRUNCATE TABLE customer_order_metrics_dly

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
Done.

Copy to clipboard

[]

Copy to clipboard

%%sql

INSERT INTO customer_order_metrics_dly
SELECT o.order_customer_id,
    o.order_date,
    count(1) order_count,
    NULL
FROM orders o 
    JOIN order_items oi
        ON o.order_id = oi.order_item_order_id
WHERE o.order_date BETWEEN '2013-08-01' AND '2013-08-31'
GROUP BY o.order_customer_id,
    o.order_date

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
4708 rows affected.

Copy to clipboard

[]

Copy to clipboard

Note

We need to have unique or primary key constraint on the columns specified as part of ON CONFLICT clause.

%%sql

ALTER TABLE customer_order_metrics_dly DROP CONSTRAINT customer_order_metrics_dly_pkey

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
Done.

Copy to clipboard

[]

Copy to clipboard

%%sql

ALTER TABLE customer_order_metrics_dly
    ADD PRIMARY KEY (customer_id, order_date)

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
Done.

Copy to clipboard

[]

Copy to clipboard

%%sql

INSERT INTO customer_order_metrics_dly
SELECT o.order_customer_id,
    o.order_date,
    count(1) order_count,
    round(sum(order_item_subtotal)::numeric, 2) AS order_revenue
FROM orders o 
    JOIN order_items oi
        ON o.order_id = oi.order_item_order_id
WHERE o.order_date BETWEEN '2013-08-01' AND '2013-10-31'
GROUP BY o.order_customer_id,
    o.order_date
ON CONFLICT (customer_id, order_date) DO UPDATE SET
    order_count = EXCLUDED.order_count,
    order_revenue = EXCLUDED.order_revenue

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
13973 rows affected.

Copy to clipboard

[]

Copy to clipboard

%%sql

SELECT * FROM customer_order_metrics_dly
WHERE order_date::varchar ~ '2013-09'
ORDER BY order_date, customer_id
LIMIT 10

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.

Copy to clipboard

customer_id order_date order_count order_revenue
19 2013-09-01 5 839.92
95 2013-09-01 5 969.85
136 2013-09-01 4 639.94
247 2013-09-01 2 639.94
383 2013-09-01 5 729.9
437 2013-09-01 4 829.97
543 2013-09-01 4 1489.83
601 2013-09-01 2 159.99
689 2013-09-01 2 419.96
842 2013-09-01 4 954.87
%%sql

SELECT to_char(order_date, 'yyyy-MM'), count(1) FROM customer_order_metrics_dly
GROUP BY to_char(order_date, 'yyyy-MM')
LIMIT 10

Copy to clipboard

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
3 rows affected.

Copy to clipboard

to_char count
2013-08 4708
2013-10 4417
2013-09 4848