Performance Tuning using Indexes

Topic
Materials

Let us understand how we can improve the performance of the query by creating index on order_items.order_item_order_id.

  • We have order level details in orders and item level details in order_items.

  • When customer want to review their orders, they need details about order_items. In almost all the scenarios in order management system, we prefer to get both order as well as order_items details by passing order_id of pending or outstanding orders.

  • Let us review the explain plan for the query with out index on order_items.order_item_order_id.

EXPLAIN
SELECT o.*,
    oi.order_item_subtotal
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
WHERE o.order_id = 2;

Copy to clipboard

  • Develop piece of code to randomly pass 2000 order ids and calculate time.

!pip install psycopg2

Copy to clipboard

import psycopg2

Copy to clipboard

%%time
connection = psycopg2.connect(
    host='localhost',
    port='5432',
    database='itversity_retail_db',
    user='itversity_retail_user',
    password='retail_password'
)
cursor = connection.cursor()
query = '''SELECT count(1) 
FROM orders o JOIN order_items oi 
    ON o.order_id = oi.order_item_order_id
WHERE o.order_id = %s
'''
ctr = 0
while True:
    if ctr == 2000:
        break
    cursor.execute(query, (1,))
    ctr += 1
cursor.close()
connection.close()

Copy to clipboard

  • Create index on order_items.order_item_order_id

%load_ext sql

Copy to clipboard

%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

Copy to clipboard

%%sql

CREATE INDEX order_items_order_id_idx 
ON order_items(order_item_order_id);

Copy to clipboard

  • Run explain plan after creating index on order_items.order_item_order_id

EXPLAIN
SELECT o.*,
    oi.order_item_subtotal
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
WHERE o.order_id = 2;

Copy to clipboard

                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.71..16.81 rows=3 width=34)
   ->  Index Scan using orders_pkey on orders o  (cost=0.29..8.31 rows=1 width=26)
         Index Cond: (order_id = 2)
   ->  Index Scan using order_items_order_id_idx on order_items oi  (cost=0.42..8.47 rows=3 width=12)
         Index Cond: (order_item_order_id = 2)
(5 rows)

Copy to clipboard

  • Run the code again to see how much time, it get the results for 2000 random orders.

import psycopg2

Copy to clipboard

%%time

from random import randrange
connection = psycopg2.connect(
    host='localhost',
    port='5432',
    database='itversity_retail_db',
    user='itversity_retail_user',
    password='retail_password'
)
cursor = connection.cursor()
query = '''SELECT count(1) 
FROM orders o JOIN order_items oi 
    ON o.order_id = oi.order_item_order_id
WHERE o.order_id = %s
'''
ctr = 0
while True:
    if ctr == 2000:
        break
    order_id = randrange(1, 68883)
    cursor.execute(query, (order_id,))
    ctr += 1
cursor.close()
connection.close()

Copy to clipboard

Warning

Keep in mind that having indexes on tables can have negative impact on write operations.