Indexes on Tables¶
Let us go through the details related to indexes supported in RDBMS such as Postgres.
- An index can be unique or non unique.
- Unique Index – Data will be sorted in ascending order and uniqueness is enforced.
- Non Unique Index – Data will be sorted in ascending order and uniqueness is not enforced.
- Unless specified all indexes are of type B Tree.
- For sparsely populated columns, we tend to create B Tree indexes. B Tree indexes are the most commonly used ones.
- For densely populated columns such as gender, month etc with very few distinct values we can leverage bit map index. However bitmap indexes are not used quite extensively in typical web or mobile applications.
- Write operations will become relatively slow as data have to be managed in index as well as table.
- We need to be careful while creating indexes on the tables as write operations can become slow as more indexes are added to the table.
- Here are some of the criteria for creating indexes.
- Create unique indexes when you want to enforce uniqueness. If you define unique constraint or primary key constraint, it will create unique index internally.
- If we are performing joins between 2 tables based on a value, then the foreign key column in the child table should be indexed.
- Typically as part of order management system, we tend to get all the order details for a given order using order id.
- In our case we will be able to improve the query performance by adding index on order_items.order_item_order_id.
- However, write operation will become a bit slow. But it is acceptable and required to create index on order_items.order_item_order_id as we write once and read many times over the life of the order.
- Let us perform tasks related to indexes.
- Drop and recreate retail db tables.
- Load data into retail db tables.
- Compute statistics (Optional). It is typically taken care automatically by the schedules defined by DBAs.
- Use code to randomly fetch 2000 orders and join with order_items – compute time.
- Create index for order_items.order_item_order_id and compute statistics
- Use code to randomly fetch 2000 orders and join with order_items – compute time.
- Script to create tables and load data in case there are no tables in retail database.
psql -U itversity_retail_user \
-h pg.itversity.com \
-p 5432 \
-d itversity_retail_db \
-W
DROP TABLE order_items;
DROP TABLE orders;
DROP TABLE products;
DROP TABLE categories;
DROP TABLE departments;
DROP TABLE customers;
\i /data/retail_db/create_db_tables_pg.sql
\i /data/retail_db/load_db_tables_pg.sql
In [1]:
!pip install psycopg2
Defaulting to user installation because normal site-packages is not writeable Collecting psycopg2 Downloading psycopg2-2.9.3.tar.gz (380 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 380.6/380.6 KB 38.2 MB/s eta 0:00:00 Preparing metadata (setup.py) ... done Building wheels for collected packages: psycopg2 Building wheel for psycopg2 (setup.py) ... done Created wheel for psycopg2: filename=psycopg2-2.9.3-cp38-cp38-linux_x86_64.whl size=476635 sha256=c487739926e464649572a2469dad7080d5ebc30eb7e5c0c3f0da2eeada87a14f Stored in directory: /home/itversity/.cache/pip/wheels/f3/dc/e2/b8e0e2142eff7fd680295ecd2d92e3bfbb90195523e43da161 Successfully built psycopg2 Installing collected packages: psycopg2 Successfully installed psycopg2-2.9.3
In [2]:
import psycopg2
In [3]:
%%time
from random import randrange
connection = psycopg2.connect(
host='pg.itversity.com',
port='5432',
database='itversity_retail_db',
user='itversity_retail_user',
password='retail_password'
)
cursor = connection.cursor()
query = '''SELECT *
FROM orders o JOIN order_items oi
ON o.order_id = oi.order_item_order_id
WHERE o.order_id = %s
'''
ctr = 0
while True:
if ctr == 2000:
break
order_id = randrange(1, 68883)
cursor.execute(query, (order_id,))
ctr += 1
cursor.close()
connection.close()
CPU times: user 160 ms, sys: 31.3 ms, total: 192 ms Wall time: 22.9 s
In [4]:
%load_ext sql
In [5]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@pg.itversity.com:5432/itversity_retail_db
env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@pg.itversity.com:5432/itversity_retail_db
In [6]:
%%sql
CREATE INDEX order_items_oid_idx
ON order_items(order_item_order_id)
Done.
Out[6]:
[]
In [7]:
%%time
from random import randrange
connection = psycopg2.connect(
host='pg.itversity.com',
port='5432',
database='itversity_retail_db',
user='itversity_retail_user',
password='retail_password'
)
cursor = connection.cursor()
query = '''SELECT *
FROM orders o JOIN order_items oi
ON o.order_id = oi.order_item_order_id
WHERE o.order_id = %s
'''
ctr = 0
while True:
if ctr == 2000:
break
order_id = randrange(1, 68883)
cursor.execute(query, (order_id,))
ctr += 1
cursor.close()
connection.close()
CPU times: user 111 ms, sys: 27.9 ms, total: 139 ms Wall time: 421 ms