Data Loading Utilities into Postgres Database¶
Let us understand how we can load the data into databases using utilities provided.
- Most of the databases provide data loading utilities.
- One of the most common way of getting data into database tables is by using data loading utilities provided by the underlying datatabase technology.
- We can load delimited files into database using these utilities.
- Here are the steps we can follow to load the delimited data into the table.
- Make sure files are available on the server from which we are trying to load.
- Ensure the database and table are created for the data to be loaded.
- Run relevant command to load the data into the table.
- Make sure to validate by running queries.
- Let us see a demo by loading a sample file into the table in Postgres database.
Loading Data¶
We can use COPY Command using psql
to copy the data into the table.
- Make sure database is created along with the user with right permissions. Also the user who want to use
COPY
command need to have pg_read_server_files role assigned. - Connect to the Postgres Database on pg.itversity.com as super user postgres and run relevant grant command to grant the permissions.
docker-compose exec pg.itversity.com psql -U postgres
- Run this command to grant the required permissions to run COPY Command to the user itversity_sms_user.
GRANT pg_read_server_files TO itversity_sms_user;
\q
- Connect to pg.itversity.com using bash and run
mkdir
command to create required folder or directory structure.
docker-compose exec pg.itversity.com mkdir -p /data/sms_db
docker-compose exec pg.itversity.com ls -ltr /data
- You can find users.csv file under data/sms_db as part of the repository directory data-engineering-spark. You just have to copy the file to pg.itversity.com under /data/sms_db. The file contain below data.
user_first_name,user_last_name,user_email_id,user_role,created_dt
Gordan,Bradock,gbradock0@barnesandnoble.com,A,2020-01-10
Tobe,Lyness,tlyness1@paginegialle.it,U,2020-02-10
Addie,Mesias,amesias2@twitpic.com,U,2020-03-05
Corene,Kohrsen,ckohrsen3@buzzfeed.com,U,2020-04-15
Darill,Halsall,dhalsall4@intel.com,U,2020-10-10
- Copy users.csv into pg.itversity.com.
docker cp data/sms_db/users.csv data-engineering-spark-pg.itversity.com-1:/data/sms_db
docker-compose exec pg.itversity.com ls -ltr /data/sms_db
If you are using docker compose on AWS Cloud9 and if the docker cp
command is failing try below commands.
docker cp data/sms_db/users.csv data-engineering-spark_pg.itversity.com_1:/data/sms_db
docker-compose exec pg.itversity.com ls -ltr /data/sms_db
- Connect to Database.
docker-compose exec pg.itversity.com psql -U itversity_sms_user -d itversity_sms_db
- Create the
users
table.
CREATE TABLE users (
user_id SERIAL PRIMARY KEY,
user_first_name VARCHAR(30) NOT NULL,
user_last_name VARCHAR(30) NOT NULL,
user_email_id VARCHAR(50) NOT NULL,
user_email_validated BOOLEAN DEFAULT FALSE,
user_password VARCHAR(200),
user_role VARCHAR(1) NOT NULL DEFAULT 'U', --U and A
is_active BOOLEAN DEFAULT FALSE,
created_dt DATE DEFAULT CURRENT_DATE
);
- Use copy command to load the data
COPY users(user_first_name, user_last_name,
user_email_id, user_role, created_dt
) FROM '/data/sms_db/users.csv'
DELIMITER ','
CSV HEADER;
- Validate by running queries
SELECT * FROM users;