Criteria for Indexing

Let us go through some of the criteria to create indexes on tables.

  • Indexes are required to enforce constraints such as Primary Key, Unique etc. Indexes will be automatically created, when we define a column(s) Primary Key or Unique.

  • Too many indexes on a given table, can slow down the performance of inserts, updates and deletes on that table. Hence, you need to make sure to strike right balance by creating indexes only when they are required.

  • Thorough analysis need to be done about how the queries will hit the table from the application.

  • We might have to create indexes on foreign key columns of the child table.

  • When we have tables with multiple parents, we need to be due diligent about how the index should be created.

    • Shall we create 2 indexes?

    • Shall we create 1 index with both the columns pointing to 2 tables?

    • If we want to create 1 index with both the columns what should be the order?

  • Here are some of the scenarios from the application perspective based upon which we can consider creating indexes.

    • Customer checking all his orders.

      • We need to get the data from orders using customer id and hence we need to add index on orders.order_customer_id.

    • Customer checking order details for a given order which include order_item_subtotal as well as product names.

      • We need to join orders, order_items as well as products.

      • order_items is child table for both orders and products.

      • We can create composite index on order_items.order_item_order_id and order_items.order_item_product_id.

    • Customer care executive to check all the order details placed by customer using at least first 3 characters of customer’s first name.

      • We can consider creating index on customers.customer_fname using upper or lower. You can also consider adding customer_id to the index along with customer_fname.

      • Also to get all the order details for a given customer, we have to ensure that there is an index on orders.order_customer_id.

%load_ext sql

Copy to clipboard

%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

Copy to clipboard

%%sql

DROP INDEX order_items_order_id_idx

Copy to clipboard

%%sql

SELECT min(customer_id), max(customer_id), count(1)
FROM customers

Copy to clipboard

import psycopg2

Copy to clipboard

%%time

from random import randrange
connection = psycopg2.connect(
    host='localhost',
    port='5432',
    database='itversity_retail_db',
    user='itversity_retail_user',
    password='retail_password'
)
cursor = connection.cursor()
query = '''SELECT count(1) 
FROM orders o
WHERE order_customer_id = %s
'''
ctr = 0
while True:
    if ctr == 2000:
        break
    customer_id = randrange(10950, 12435)
    cursor.execute(query, (customer_id,))
    ctr += 1
cursor.close()
connection.close()

Copy to clipboard

%%sql

CREATE INDEX orders_customer_id_idx
ON orders(order_customer_id)

Copy to clipboard

%%time

from random import randrange
connection = psycopg2.connect(
    host='localhost',
    port='5432',
    database='itversity_retail_db',
    user='itversity_retail_user',
    password='retail_password'
)
cursor = connection.cursor()
query = '''SELECT count(1) 
FROM orders o
WHERE order_customer_id = %s
'''
ctr = 0
while True:
    if ctr == 2000:
        break
    customer_id = randrange(10950, 12435)
    cursor.execute(query, (customer_id,))
    ctr += 1
cursor.close()
connection.close()

Copy to clipboard

%%time

from random import randrange
connection = psycopg2.connect(
    host='localhost',
    port='5432',
    database='itversity_retail_db',
    user='itversity_retail_user',
    password='retail_password'
)
cursor = connection.cursor()
query = '''SELECT count(1) 
FROM orders o
    JOIN order_items oi
        ON o.order_id = oi.order_item_order_id
    JOIN products p
        ON p.product_id = oi.order_item_product_id
WHERE order_id = %s
'''
ctr = 0
while True:
    if ctr == 2000:
        break
    order_id = randrange(1, 68883)
    cursor.execute(query, (order_id,))
    ctr += 1
cursor.close()
connection.close()

Copy to clipboard

%%sql

CREATE INDEX order_items_oid_pid_idx 
ON order_items(order_item_order_id, order_item_product_id);

Copy to clipboard

%%time

from random import randrange
connection = psycopg2.connect(
    host='localhost',
    port='5432',
    database='itversity_retail_db',
    user='itversity_retail_user',
    password='retail_password'
)
cursor = connection.cursor()
query = '''SELECT count(1) 
FROM orders o
    JOIN order_items oi
        ON o.order_id = oi.order_item_order_id
    JOIN products p
        ON p.product_id = oi.order_item_product_id
WHERE order_id = %s
'''
ctr = 0
while True:
    if ctr == 2000:
        break
    order_id = randrange(1, 68883)
    cursor.execute(query, (order_id,))
    ctr += 1
cursor.close()
connection.close()

Copy to clipboard

Note

As our products table only have handful of records there will not be significant difference in performance between the 2 approaches.

  • Index on order_items.order_item_order_id

  • Index on order_items.order_item_order_id, order_items.order_item_product_id

Howeever if you create index using product id as driving field then the performance will not be as good as above 2 approaches.

%%sql

DROP INDEX order_items_oid_pid_idx

Copy to clipboard

%%sql

CREATE INDEX order_items_pid_oid_idx 
ON order_items(order_item_product_id, order_item_order_id);

Copy to clipboard

%%time

from random import randrange
connection = psycopg2.connect(
    host='localhost',
    port='5432',
    database='itversity_retail_db',
    user='itversity_retail_user',
    password='retail_password'
)
cursor = connection.cursor()
query = '''SELECT count(1) 
FROM orders o
    JOIN order_items oi
        ON o.order_id = oi.order_item_order_id
    JOIN products p
        ON p.product_id = oi.order_item_product_id
WHERE order_id = %s
'''
ctr = 0
while True:
    if ctr == 2000:
        break
    order_id = randrange(1, 68883)
    cursor.execute(query, (order_id,))
    ctr += 1
cursor.close()
connection.close()

Copy to clipboard

Note

Here are the indexes to tune the performance of comparing with at least first 3 characters of customer first name.

%%sql

DROP INDEX IF EXISTS orders_customer_id_idx

Copy to clipboard

%%sql

DROP INDEX IF EXISTS customers_customer_fname_idx

Copy to clipboard

  • Explain plan for query with out indexes.

EXPLAIN
SELECT * 
FROM orders o JOIN customers c
    ON o.order_customer_id = c.customer_id
WHERE upper(c.customer_fname) = upper('mar');

Copy to clipboard

                               QUERY PLAN
-------------------------------------------------------------------------
 Hash Join  (cost=42.38..1437.09 rows=40 width=99)
   Hash Cond: (o.order_customer_id = c.customer_id)
   ->  Seq Scan on orders o  (cost=0.00..1213.83 rows=68883 width=26)
   ->  Hash  (cost=42.29..42.29 rows=7 width=73)
         ->  Seq Scan on customers c  (cost=0.00..42.29 rows=7 width=73)
               Filter: (upper((customer_fname)::text) = 'MAR'::text)
(6 rows)

Copy to clipboard

%%sql

CREATE INDEX customers_customer_fname_idx
ON customers(upper(customer_fname))

Copy to clipboard

%%sql

CREATE INDEX orders_customer_id_idx
ON orders(order_customer_id)

Copy to clipboard

  • Explain plan for query with indexes. Check the cost, it is significantly low when compared to the plan generated for the same query with out indexes.

EXPLAIN
SELECT * 
FROM orders o JOIN customers c
    ON o.order_customer_id = c.customer_id
WHERE upper(c.customer_fname) = upper('mar');

Copy to clipboard

                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Nested Loop  (cost=8.67..204.43 rows=40 width=99)
   ->  Bitmap Heap Scan on customers c  (cost=4.33..18.58 rows=7 width=73)
         Recheck Cond: (upper((customer_fname)::text) = 'MAR'::text)
         ->  Bitmap Index Scan on customers_customer_fname_idx  (cost=0.00..4.33 rows=7 width=0)
               Index Cond: (upper((customer_fname)::text) = 'MAR'::text)
   ->  Bitmap Heap Scan on orders o  (cost=4.34..26.49 rows=6 width=26)
         Recheck Cond: (order_customer_id = c.customer_id)
         ->  Bitmap Index Scan on orders_customer_id_idx  (cost=0.00..4.34 rows=6 width=0)
               Index Cond: (order_customer_id = c.customer_id)
(9 rows)