Managing Database Objects¶
This exercise is primarily to assess your capabilities related to put all important DDL concepts in practice by coming up with solution for a typical data migration problem from one database (mysql) to another (postgres).
- Here are the high level steps for database migration from one type of database to another type of database.
- Extract DDL Statements from source database (MySQL).
- Extract the data in the form of delimited files and ship them to target database.
- Refactor scripts as per target database (Postgres).
- Create tables in the target database.
- Execute pre-migration steps (disable constraints, drop indexes etc).
- Load the data using native utilities.
- Execute post-migration steps (enable constraints, create or rebuild indexes, reset sequences etc).
- Sanity checks with basic queries.
- Make sure all the impacted applications are validated thoroughly.
- We have scripts and data set available in our GitHub repository. If you are using our environment the repository is already cloned under /data/retail_db.
- It have scripts to create tables with primary keys. Those scripts are generated from MySQL tables and refactored for Postgres.
- Script to create tables: create_db_tables_pg.sql
- Load data into tables: load_db_tables_pg.sql
- Here are the steps you need to perform to take care of this exercise.
- Create tables
- Load data
- All the tables have surrogate primary keys. Here are the details.
- orders.order_id
- order_items.order_item_id
- customers.customer_id
- products.product_id
- categories.category_id
- departments.department_id
- Get the maximum value from all surrogate primary key fields.
- Create sequences for all surrogate primary key fields using maximum value. Make sure to use standard naming conventions for sequences.
- Ensure sequences are mapped to the surrogate primary key fields.
- Create foreign key constraints based up on this information.
- orders.order_customer_id to customers.customer_id
- order_items.order_item_order_id to orders.order_id
- order_items.order_item_product_id to products.product_id
- Insert few records in
departments
to ensure that sequence generated numbers are used fordepartment_id
.
- Here are the commands to launch
psql
and run scripts to create tables as well as load data into tables.
psql -U itversity_retail_user \
-h localhost \
-p 5432 \
-d itversity_retail_db \
-W
DROP TABLE IF EXISTS order_items CASCADE;
DROP TABLE IF EXISTS orders CASCADE;
DROP TABLE IF EXISTS customers CASCADE;
DROP TABLE IF EXISTS products CASCADE;
DROP TABLE IF EXISTS categories CASCADE;
DROP TABLE IF EXISTS departments CASCADE;
\i /data/retail_db/create_db_tables_pg.sql
\i /data/retail_db/load_db_tables_pg.sql
- We use this approach of creating tables, loading data and then adding constraints as well as resetting sequences for large volume data migrations from one database to another database.
- Here are the commands or queries you need to come up with to solve this problem.
%load_ext sql
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
Exercise 1¶
Queries to get maximum values from surrogate primary keys.
%%sql
SELECT max(category_id) FROM categories;
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db 1 rows affected.
max |
---|
58 |
%%sql
SELECT max(customer_id) FROM customers;
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db 1 rows affected.
max |
---|
12435 |
%%sql
SELECT max(department_id) FROM departments;
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db 1 rows affected.
max |
---|
7 |
%%sql
SELECT max(order_item_id) FROM order_items;
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db 1 rows affected.
max |
---|
172198 |
%%sql
SELECT max(order_id) FROM orders;
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db 1 rows affected.
max |
---|
68883 |
%%sql
SELECT max(product_id) FROM products;
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db 1 rows affected.
max |
---|
1345 |
Exercise 2¶
Commands to add sequences with START WITH
pointing to the maximum value for the corresponding surrogate primary key fields. Make sure to use meaningful names to sequences TABLENAME_SURROGATEFIELD_seq (example: users_user_id_seq for users.user_id)
%%sql
CREATE SEQUENCE categories_category_id_seq
START WITH 59
INCREMENT BY 1;
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db Done.
[]
%%sql
CREATE SEQUENCE customers_customer_id_seq
START WITH 12436
INCREMENT BY 1;
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db Done.
[]
%%sql
CREATE SEQUENCE departments_department_id_seq
START WITH 8
INCREMENT BY 1;
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db Done.
[]
%%sql
CREATE SEQUENCE orders_order_id_seq
START WITH 68884
INCREMENT BY 1;
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db Done.
[]
%%sql
CREATE SEQUENCE order_items_order_item_id_seq
START WITH 172199
INCREMENT BY 1;
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db Done.
[]
%%sql
CREATE SEQUENCE products_product_id_seq
START WITH 1346
INCREMENT BY 1;
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db Done.
[]
Exercise 3¶
Commands to alter sequences to bind them to corresponding surrogate primary key fields.
%%sql
ALTER TABLE categories
ALTER COLUMN category_id
SET DEFAULT nextval('categories_category_id_seq');
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db Done.
[]
%%sql
ALTER TABLE customers
ALTER COLUMN customer_id
SET DEFAULT nextval('customers_customer_id_seq');
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db Done.
[]
%%sql
ALTER TABLE departments
ALTER COLUMN department_id
SET DEFAULT nextval('departments_department_id_seq');
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db Done.
[]
%%sql
ALTER TABLE orders
ALTER COLUMN order_id
SET DEFAULT nextval('orders_order_id_seq');
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db Done.
[]
%%sql
ALTER TABLE order_items
ALTER COLUMN order_item_id
SET DEFAULT nextval('order_items_order_item_id_seq');
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db Done.
[]
%%sql
ALTER TABLE products
ALTER COLUMN product_id
SET DEFAULT nextval('products_product_id_seq');
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db Done.
[]
Exercise 4¶
Add Foreign Key constraints to the tables.
- Validate if the tables have data violataing foreign key constraints (Hint: You can use left outer join to find rows in child table but not in parent table)
- Alter tables to add foreign keys as specified.
- Here are the relationships for your reference.
- orders.order_customer_id to customers.customer_id
- order_items.order_item_order_id to orders.order_id
- order_items.order_item_product_id to products.product_id
- Solution should contain the following:
- Commands to add foreign keys to the tables.
%%sql
ALTER TABLE orders
ADD CONSTRAINT orders_customers_fk
FOREIGN KEY (order_customer_id)
REFERENCES customers(customer_id);
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db Done.
[]
%%sql
ALTER TABLE order_items
ADD CONSTRAINT order_items_orders_fk
FOREIGN KEY (order_item_order_id)
REFERENCES orders(order_id);
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db Done.
[]
%%sql
ALTER TABLE order_items
ADD CONSTRAINT order_items_products_fk
FOREIGN KEY (order_item_product_id)
REFERENCES products(product_id);
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db Done.
[]
Exercise 5¶
Queries to validate whether constraints are created or not. You can come up with queries against information_schema
tables such as columns
, sequences
etc.
%%sql
SELECT * FROM information_schema.sequences LIMIT 10
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db 8 rows affected.
sequence_catalog | sequence_schema | sequence_name | data_type | numeric_precision | numeric_precision_radix | numeric_scale | start_value | minimum_value | maximum_value | increment | cycle_option |
---|---|---|---|---|---|---|---|---|---|---|---|
itversity_retail_db | public | users_user_id_seq | integer | 32 | 2 | 0 | 1 | 1 | 2147483647 | 1 | NO |
itversity_retail_db | public | user_logins_user_login_id_seq | integer | 32 | 2 | 0 | 1 | 1 | 2147483647 | 1 | NO |
itversity_retail_db | public | categories_category_id_seq | bigint | 64 | 2 | 0 | 59 | 1 | 9223372036854775807 | 1 | NO |
itversity_retail_db | public | customers_customer_id_seq | bigint | 64 | 2 | 0 | 12436 | 1 | 9223372036854775807 | 1 | NO |
itversity_retail_db | public | departments_department_id_seq | bigint | 64 | 2 | 0 | 8 | 1 | 9223372036854775807 | 1 | NO |
itversity_retail_db | public | orders_order_id_seq | bigint | 64 | 2 | 0 | 68884 | 1 | 9223372036854775807 | 1 | NO |
itversity_retail_db | public | order_items_order_item_id_seq | bigint | 64 | 2 | 0 | 172199 | 1 | 9223372036854775807 | 1 | NO |
itversity_retail_db | public | products_product_id_seq | bigint | 64 | 2 | 0 | 1346 | 1 | 9223372036854775807 | 1 | NO |
%%sql
SELECT * FROM information_schema.columns
WHERE table_name = 'orders';
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db 4 rows affected.
table_catalog | table_schema | table_name | column_name | ordinal_position | column_default | is_nullable | data_type | character_maximum_length | character_octet_length | numeric_precision | numeric_precision_radix | numeric_scale | datetime_precision | interval_type | interval_precision | character_set_catalog | character_set_schema | character_set_name | collation_catalog | collation_schema | collation_name | domain_catalog | domain_schema | domain_name | udt_catalog | udt_schema | udt_name | scope_catalog | scope_schema | scope_name | maximum_cardinality | dtd_identifier | is_self_referencing | is_identity | identity_generation | identity_start | identity_increment | identity_maximum | identity_minimum | identity_cycle | is_generated | generation_expression | is_updatable |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
itversity_retail_db | public | orders | order_id | 1 | nextval(‘orders_order_id_seq’::regclass) | NO | integer | None | None | 32 | 2 | 0 | None | None | None | None | None | None | None | None | None | None | None | None | itversity_retail_db | pg_catalog | int4 | None | None | None | None | 1 | NO | NO | None | None | None | None | None | NO | NEVER | None | YES |
itversity_retail_db | public | orders | order_date | 2 | None | NO | timestamp without time zone | None | None | None | None | None | 6 | None | None | None | None | None | None | None | None | None | None | None | itversity_retail_db | pg_catalog | timestamp | None | None | None | None | 2 | NO | NO | None | None | None | None | None | NO | NEVER | None | YES |
itversity_retail_db | public | orders | order_customer_id | 3 | None | NO | integer | None | None | 32 | 2 | 0 | None | None | None | None | None | None | None | None | None | None | None | None | itversity_retail_db | pg_catalog | int4 | None | None | None | None | 3 | NO | NO | None | None | None | None | None | NO | NEVER | None | YES |
itversity_retail_db | public | orders | order_status | 4 | None | NO | character varying | 45 | 180 | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | itversity_retail_db | pg_catalog | varchar | None | None | None | None | 4 | NO | NO | None | None | None | None | None | NO | NEVER | None | YES |
%%sql
SELECT * FROM information_schema.table_constraints
WHERE table_name = 'orders'
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db 6 rows affected.
constraint_catalog | constraint_schema | constraint_name | table_catalog | table_schema | table_name | constraint_type | is_deferrable | initially_deferred | enforced |
---|---|---|---|---|---|---|---|---|---|
itversity_retail_db | public | orders_pkey | itversity_retail_db | public | orders | PRIMARY KEY | NO | NO | YES |
itversity_retail_db | public | orders_customers_fk | itversity_retail_db | public | orders | FOREIGN KEY | NO | NO | YES |
itversity_retail_db | public | 2200_25581_1_not_null | itversity_retail_db | public | orders | CHECK | NO | NO | YES |
itversity_retail_db | public | 2200_25581_2_not_null | itversity_retail_db | public | orders | CHECK | NO | NO | YES |
itversity_retail_db | public | 2200_25581_3_not_null | itversity_retail_db | public | orders | CHECK | NO | NO | YES |
itversity_retail_db | public | 2200_25581_4_not_null | itversity_retail_db | public | orders | CHECK | NO | NO | YES |