Let us go through some of the criteria to create indexes on tables.
Indexes are required to enforce constraints such as Primary Key, Unique etc. Indexes will be automatically created, when we define a column(s) Primary Key or Unique.
Too many indexes on a given table, can slow down the performance of inserts, updates and deletes on that table. Hence, you need to make sure to strike right balance by creating indexes only when they are required.
Thorough analysis need to be done about how the queries will hit the table from the application.
We might have to create indexes on foreign key columns of the child table.
When we have tables with multiple parents, we need to be due diligent about how the index should be created.
Shall we create 2 indexes?
Shall we create 1 index with both the columns pointing to 2 tables?
If we want to create 1 index with both the columns what should be the order?
Here are some of the scenarios from the application perspective based upon which we can consider creating indexes.
Customer checking all his orders.
We need to get the data from orders using customer id and hence we need to add index on orders.order_customer_id.
Customer checking order details for a given order which include order_item_subtotal as well as product names.
We need to join orders, order_items as well as products.
order_items is child table for both orders and products.
We can create composite index on order_items.order_item_order_id and order_items.order_item_product_id.
Customer care executive to check all the order details placed by customer using at least first 3 characters of customer’s first name.
We can consider creating index on customers.customer_fname using upper or lower. You can also consider adding customer_id to the index along with customer_fname.
Also to get all the order details for a given customer, we have to ensure that there is an index on orders.order_customer_id.
%%time
from random import randrange
connection = psycopg2.connect(
host='localhost',
port='5432',
database='itversity_retail_db',
user='itversity_retail_user',
password='retail_password'
)
cursor = connection.cursor()
query = '''SELECT count(1)
FROM orders o
WHERE order_customer_id = %s
'''
ctr = 0
while True:
if ctr == 2000:
break
customer_id = randrange(10950, 12435)
cursor.execute(query, (customer_id,))
ctr += 1
cursor.close()
connection.close()
%%time
from random import randrange
connection = psycopg2.connect(
host='localhost',
port='5432',
database='itversity_retail_db',
user='itversity_retail_user',
password='retail_password'
)
cursor = connection.cursor()
query = '''SELECT count(1)
FROM orders o
WHERE order_customer_id = %s
'''
ctr = 0
while True:
if ctr == 2000:
break
customer_id = randrange(10950, 12435)
cursor.execute(query, (customer_id,))
ctr += 1
cursor.close()
connection.close()
%%time
from random import randrange
connection = psycopg2.connect(
host='localhost',
port='5432',
database='itversity_retail_db',
user='itversity_retail_user',
password='retail_password'
)
cursor = connection.cursor()
query = '''SELECT count(1)
FROM orders o
JOIN order_items oi
ON o.order_id = oi.order_item_order_id
JOIN products p
ON p.product_id = oi.order_item_product_id
WHERE order_id = %s
'''
ctr = 0
while True:
if ctr == 2000:
break
order_id = randrange(1, 68883)
cursor.execute(query, (order_id,))
ctr += 1
cursor.close()
connection.close()
%%time
from random import randrange
connection = psycopg2.connect(
host='localhost',
port='5432',
database='itversity_retail_db',
user='itversity_retail_user',
password='retail_password'
)
cursor = connection.cursor()
query = '''SELECT count(1)
FROM orders o
JOIN order_items oi
ON o.order_id = oi.order_item_order_id
JOIN products p
ON p.product_id = oi.order_item_product_id
WHERE order_id = %s
'''
ctr = 0
while True:
if ctr == 2000:
break
order_id = randrange(1, 68883)
cursor.execute(query, (order_id,))
ctr += 1
cursor.close()
connection.close()
%%time
from random import randrange
connection = psycopg2.connect(
host='localhost',
port='5432',
database='itversity_retail_db',
user='itversity_retail_user',
password='retail_password'
)
cursor = connection.cursor()
query = '''SELECT count(1)
FROM orders o
JOIN order_items oi
ON o.order_id = oi.order_item_order_id
JOIN products p
ON p.product_id = oi.order_item_product_id
WHERE order_id = %s
'''
ctr = 0
while True:
if ctr == 2000:
break
order_id = randrange(1, 68883)
cursor.execute(query, (order_id,))
ctr += 1
cursor.close()
connection.close()
Explain plan for query with out indexes.
QUERY PLAN
-------------------------------------------------------------------------
Hash Join (cost=42.38..1437.09 rows=40 width=99)
Hash Cond: (o.order_customer_id = c.customer_id)
-> Seq Scan on orders o (cost=0.00..1213.83 rows=68883 width=26)
-> Hash (cost=42.29..42.29 rows=7 width=73)
-> Seq Scan on customers c (cost=0.00..42.29 rows=7 width=73)
Filter: (upper((customer_fname)::text) = 'MAR'::text)
(6 rows)
Explain plan for query with indexes. Check the cost, it is significantly low when compared to the plan generated for the same query with out indexes.
QUERY PLAN
-------------------------------------------------------------------------------------------------
Nested Loop (cost=8.67..204.43 rows=40 width=99)
-> Bitmap Heap Scan on customers c (cost=4.33..18.58 rows=7 width=73)
Recheck Cond: (upper((customer_fname)::text) = 'MAR'::text)
-> Bitmap Index Scan on customers_customer_fname_idx (cost=0.00..4.33 rows=7 width=0)
Index Cond: (upper((customer_fname)::text) = 'MAR'::text)
-> Bitmap Heap Scan on orders o (cost=4.34..26.49 rows=6 width=26)
Recheck Cond: (order_customer_id = c.customer_id)
-> Bitmap Index Scan on orders_customer_id_idx (cost=0.00..4.34 rows=6 width=0)
Index Cond: (order_customer_id = c.customer_id)
(9 rows)