Solutions – Loading Data into Postgres Tables

Getting Started

Here is the exercise related to getting started module.

Exercise – Loading Data

As part of this exercise, you need to take care of loading data using COPY Command.

  • You can connect to the database using following details in the environment provided by us.
    • Host: localhost
    • Port: 5342
    • Database Name: YOUR_OS_USER_hr_db
    • User Name: YOUR_OS_USER_hr_user
    • Password: YOUR_OS_USER_PASSWORD (provided by us).
  • If you are using your own environment, make sure to create database for storing HR Data.
    • Database Name: hr_db
    • User Name: hr_user
    • You can create user with password of your choice.
CREATE DATABASE hr_db;
CREATE USER hr_user WITH PASSWORD 'hr_password';
GRANT ALL ON DATABASE hr_db TO hr_user;
GRANT pg_read_server_files TO hr_user;
  • Create table using this script.
CREATE TABLE employees
   ( employee_id INTEGER
   , first_name VARCHAR(20)
   , last_name VARCHAR(25)
   , email VARCHAR(25)
   , phone_number VARCHAR(20)
   , hire_date DATE
   , job_id VARCHAR(10)
   , salary NUMERIC(8,2)
   , commission_pct NUMERIC(2,2)
   , manager_id INTEGER
   , department_id INTEGER
   ) ;
CREATE UNIQUE INDEX emp_emp_id_pk
         ON employees (employee_id) ;
ALTER TABLE employees ADD
   PRIMARY KEY (employee_id);
  • Understand data.
    • Check for delimiters (record as well as field).
    • Check whether header exists or not.
    • Ensure number of fields for the table and data being loaded are same or not.
  • Load data into the table using COPY Command.
  • Validate by running these queries. You can also use SQL Workbench to run the queries to validate whether data is loaded successfully or not.
SELECT * FROM employees LIMIT 10;
SELECT count(1) FROM employees;

Solution

Here is the solution for the exercise.

  • Connect to psql CLI.
    psql -h localhost \
      -p 5432 \
      -d itversity_hr_db \
      -U itversity_hr_user \
      -W
    
  • Make sure table is created using the script provided as part of the question.
TRUNCATE TABLE employees;

COPY employees(employee_id, first_name, last_name, email, phone_number, hire_date, job_id, salary, commission_pct,
manager_id, department_id
) FROM '/data/hr_db/employees/part-m-00000'
DELIMITER E'\t'
CSV;
In [1]:
%load_ext sql
In [5]:
%env DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db
env: DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db
In [7]:
%%sql

SELECT * FROM employees 
ORDER BY commission_pct NULLS LAST
LIMIT 10
 * postgresql://itversity_hr_user:***@localhost:5432/itversity_hr_db
10 rows affected.
Out[7]:
employee_id first_name last_name email phone_number hire_date job_id salary commission_pct manager_id department_id
165 David Lee DLEE 011.44.1346.529268 2000-02-23 SA_REP 6800.00 0.10 147 80
173 Sundita Kumar SKUMAR 011.44.1343.329268 2000-04-21 SA_REP 6100.00 0.10 148 80
164 Mattea Marvins MMARVINS 011.44.1346.329268 2000-01-24 SA_REP 7200.00 0.10 147 80
179 Charles Johnson CJOHNSON 011.44.1644.429262 2000-01-04 SA_REP 6200.00 0.10 149 80
167 Amit Banda ABANDA 011.44.1346.729268 2000-04-21 SA_REP 6200.00 0.10 147 80
166 Sundar Ande SANDE 011.44.1346.629268 2000-03-24 SA_REP 6400.00 0.10 147 80
171 William Smith WSMITH 011.44.1343.629268 1999-02-23 SA_REP 7400.00 0.15 148 80
155 Oliver Tuvault OTUVAULT 011.44.1344.486508 1999-11-23 SA_REP 7000.00 0.15 145 80
163 Danielle Greene DGREENE 011.44.1346.229268 1999-03-19 SA_REP 9500.00 0.15 147 80
172 Elizabeth Bates EBATES 011.44.1343.529268 1999-03-24 SA_REP 7300.00 0.15 148 80
In [4]:
%%sql

SELECT count(1) FROM employees
 * postgresql://itversity_hr_user:***@localhost:5432/itversity_hr_db
1 rows affected.
Out[4]:
count
107
In [ ]:
 

Share this post