This exercise is primarily to assess your capabilities related to put all important DDL concepts in practice by coming up with solution for a typical data migration problem from one database (mysql) to another (postgres).
Here are the high level steps for database migration from one type of database to another type of database.
Extract DDL Statements from source database (MySQL).
Extract the data in the form of delimited files and ship them to target database.
Refactor scripts as per target database (Postgres).
Create tables in the target database.
Execute pre-migration steps (disable constraints, drop indexes etc).
Load the data using native utilities.
Execute post-migration steps (enable constraints, create or rebuild indexes, reset sequences etc).
Sanity checks with basic queries.
Make sure all the impacted applications are validated thoroughly.
We have scripts and data set available in our GitHub repository. If you are using our environment the repository is already cloned under /data/retail_db.
It have scripts to create tables with primary keys. Those scripts are generated from MySQL tables and refactored for Postgres.
Script to create tables: create_db_tables_pg.sql
Load data into tables: load_db_tables_pg.sql
Here are the steps you need to perform to take care of this exercise.
Create tables
Load data
All the tables have surrogate primary keys. Here are the details.
orders.order_id
order_items.order_item_id
customers.customer_id
products.product_id
categories.category_id
departments.department_id
Get the maximum value from all surrogate primary key fields.
Create sequences for all surrogate primary key fields using maximum value. Make sure to use standard naming conventions for sequences.
Ensure sequences are mapped to the surrogate primary key fields.
Create foreign key constraints based up on this information.
orders.order_customer_id to customers.customer_id
order_items.order_item_order_id to orders.order_id
order_items.order_item_product_id to products.product_id
products.product_category_id to categories.category_id
categories.category_department_id to departments.department_id
Insert few records in departments
to ensure that sequence generated numbers are used for department_id
.
Here are the commands to launch psql
and run scripts to create tables as well as load data into tables.
We use this approach of creating tables, loading data and then adding constraints as well as resetting sequences for large volume data migrations from one database to another database.
Here are the commands or queries you need to come up with to solve this problem.
Queries to get maximum values from surrogate primary keys.
Commands to add sequences with START WITH
pointing to the maximum value for the corresponding surrogate primary key fields. Make sure to use meaningful names to sequences TABLENAME_SURROGATEFIELD_seq (example: users_user_id_seq for users.user_id)
Commands to alter sequences to bind them to corresponding surrogate primary key fields.
Add Foreign Key constraints to the tables.
Validate if the tables have data violataing foreign key constraints (Hint: You can use left outer join to find rows in child table but not in parent table)
Alter tables to add foreign keys as specified.
Here are the relationships for your reference.
orders.order_customer_id to customers.customer_id
order_items.order_item_order_id to orders.order_id
order_items.order_item_product_id to products.product_id
products.product_category_id to categories.category_id
categories.category_department_id to departments.department_id
Solution should contain the following:
Commands to add foreign keys to the tables.
Queries to validate whether constraints are created or not. You can come up with queries against information_schema
tables such as columns
, sequences
etc.