Getting Started¶
Here is the exercise related to getting started module.
Exercise – Loading Data¶
As part of this exercise, you need to take care of loading data using COPY
Command.
- You can connect to the database using following details in the environment provided by us.
- Host: localhost
- Port: 5342
- Database Name: YOUR_OS_USER_hr_db
- User Name: YOUR_OS_USER_hr_user
- Password: YOUR_OS_USER_PASSWORD (provided by us).
- If you are using your own environment, make sure to create database for storing HR Data.
- Database Name: hr_db
- User Name: hr_user
- You can create user with password of your choice.
CREATE DATABASE hr_db;
CREATE USER hr_user WITH PASSWORD 'hr_password';
GRANT ALL ON DATABASE hr_db TO hr_user;
GRANT pg_read_server_files TO hr_user;
- Create table using this script.
CREATE TABLE employees
( employee_id INTEGER
, first_name VARCHAR(20)
, last_name VARCHAR(25)
, email VARCHAR(25)
, phone_number VARCHAR(20)
, hire_date DATE
, job_id VARCHAR(10)
, salary NUMERIC(8,2)
, commission_pct NUMERIC(2,2)
, manager_id INTEGER
, department_id INTEGER
) ;
CREATE UNIQUE INDEX emp_emp_id_pk
ON employees (employee_id) ;
ALTER TABLE employees ADD
PRIMARY KEY (employee_id);
- Understand data.
- Check for delimiters (record as well as field).
- Check whether header exists or not.
- Ensure number of fields for the table and data being loaded are same or not.
- Load data into the table using
COPY
Command. - Validate by running these queries. You can also use SQL Workbench to run the queries to validate whether data is loaded successfully or not.
SELECT * FROM employees LIMIT 10;
SELECT count(1) FROM employees;
Solution¶
Here is the solution for the exercise.
- Connect to
psql
CLI.psql -h localhost \ -p 5432 \ -d itversity_hr_db \ -U itversity_hr_user \ -W
- Make sure table is created using the script provided as part of the question.
TRUNCATE TABLE employees;
COPY employees(employee_id, first_name, last_name, email, phone_number, hire_date, job_id, salary, commission_pct,
manager_id, department_id
) FROM '/data/hr_db/employees/part-m-00000'
DELIMITER E'\t'
CSV;
In [1]:
%load_ext sql
In [5]:
%env DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db
env: DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db
In [7]:
%%sql
SELECT * FROM employees
ORDER BY commission_pct NULLS LAST
LIMIT 10
* postgresql://itversity_hr_user:***@localhost:5432/itversity_hr_db 10 rows affected.
Out[7]:
employee_id | first_name | last_name | phone_number | hire_date | job_id | salary | commission_pct | manager_id | department_id | |
---|---|---|---|---|---|---|---|---|---|---|
165 | David | Lee | DLEE | 011.44.1346.529268 | 2000-02-23 | SA_REP | 6800.00 | 0.10 | 147 | 80 |
173 | Sundita | Kumar | SKUMAR | 011.44.1343.329268 | 2000-04-21 | SA_REP | 6100.00 | 0.10 | 148 | 80 |
164 | Mattea | Marvins | MMARVINS | 011.44.1346.329268 | 2000-01-24 | SA_REP | 7200.00 | 0.10 | 147 | 80 |
179 | Charles | Johnson | CJOHNSON | 011.44.1644.429262 | 2000-01-04 | SA_REP | 6200.00 | 0.10 | 149 | 80 |
167 | Amit | Banda | ABANDA | 011.44.1346.729268 | 2000-04-21 | SA_REP | 6200.00 | 0.10 | 147 | 80 |
166 | Sundar | Ande | SANDE | 011.44.1346.629268 | 2000-03-24 | SA_REP | 6400.00 | 0.10 | 147 | 80 |
171 | William | Smith | WSMITH | 011.44.1343.629268 | 1999-02-23 | SA_REP | 7400.00 | 0.15 | 148 | 80 |
155 | Oliver | Tuvault | OTUVAULT | 011.44.1344.486508 | 1999-11-23 | SA_REP | 7000.00 | 0.15 | 145 | 80 |
163 | Danielle | Greene | DGREENE | 011.44.1346.229268 | 1999-03-19 | SA_REP | 9500.00 | 0.15 | 147 | 80 |
172 | Elizabeth | Bates | EBATES | 011.44.1343.529268 | 1999-03-24 | SA_REP | 7300.00 | 0.15 | 148 | 80 |
In [4]:
%%sql
SELECT count(1) FROM employees
* postgresql://itversity_hr_user:***@localhost:5432/itversity_hr_db 1 rows affected.
Out[4]:
count |
---|
107 |
In [ ]: