Here are the exercises to ensure our understanding related to Pre-Defined Functions.
We will use users table as well as other tables we got as part of retail database.
Information will be provided with each exercise.
%%sql
insert into users (
user_first_name, user_last_name, user_email_id, user_gender,
user_unique_id, user_phone_no, user_dob, created_ts
) VALUES
('Giuseppe', 'Bode', 'gbode0@imgur.com', 'M', '88833-8759',
'+86 (764) 443-1967', '1973-05-31', '2018-04-15 12:13:38'),
('Lexy', 'Gisbey', 'lgisbey1@mail.ru', 'F', '262501-029',
'+86 (751) 160-3742', '2003-05-31', '2020-12-29 06:44:09'),
('Karel', 'Claringbold', 'kclaringbold2@yale.edu', 'F', '391-33-2823',
'+62 (445) 471-2682', '1985-11-28', '2018-11-19 00:04:08'),
('Marv', 'Tanswill', 'mtanswill3@dedecms.com', 'F', '1195413-80',
'+62 (497) 736-6802', '1998-05-24', '2018-11-19 16:29:43'),
('Gertie', 'Espinoza', 'gespinoza4@nationalgeographic.com', 'M', '471-24-6869',
'+249 (687) 506-2960', '1997-10-30', '2020-01-25 21:31:10'),
('Saleem', 'Danneil', 'sdanneil5@guardian.co.uk', 'F', '192374-933',
'+63 (810) 321-0331', '1992-03-08', '2020-11-07 19:01:14'),
('Rickert', 'O''Shiels', 'roshiels6@wikispaces.com', 'M', '749-27-47-52',
'+86 (184) 759-3933', '1972-11-01', '2018-03-20 10:53:24'),
('Cybil', 'Lissimore', 'clissimore7@pinterest.com', 'M', '461-75-4198',
'+54 (613) 939-6976', '1978-03-03', '2019-12-09 14:08:30'),
('Melita', 'Rimington', 'mrimington8@mozilla.org', 'F', '892-36-676-2',
'+48 (322) 829-8638', '1995-12-15', '2018-04-03 04:21:33'),
('Benetta', 'Nana', 'bnana9@google.com', 'M', '197-54-1646',
'+420 (934) 611-0020', '1971-12-07', '2018-10-17 21:02:51'),
('Gregorius', 'Gullane', 'ggullanea@prnewswire.com', 'F', '232-55-52-58',
'+62 (780) 859-1578', '1973-09-18', '2020-01-14 23:38:53'),
('Una', 'Glayzer', 'uglayzerb@pinterest.com', 'M', '898-84-336-6',
'+380 (840) 437-3981', '1983-05-26', '2019-09-17 03:24:21'),
('Jamie', 'Vosper', 'jvosperc@umich.edu', 'M', '247-95-68-44',
'+81 (205) 723-1942', '1972-03-18', '2020-07-23 16:39:33'),
('Calley', 'Tilson', 'ctilsond@issuu.com', 'F', '415-48-894-3',
'+229 (698) 777-4904', '1987-06-12', '2020-06-05 12:10:50'),
('Peadar', 'Gregorowicz', 'pgregorowicze@omniture.com', 'M', '403-39-5-869',
'+7 (267) 853-3262', '1996-09-21', '2018-05-29 23:51:31'),
('Jeanie', 'Webling', 'jweblingf@booking.com', 'F', '399-83-05-03',
'+351 (684) 413-0550', '1994-12-27', '2018-02-09 01:31:11'),
('Yankee', 'Jelf', 'yjelfg@wufoo.com', 'F', '607-99-0411',
'+1 (864) 112-7432', '1988-11-13', '2019-09-16 16:09:12'),
('Blair', 'Aumerle', 'baumerleh@toplist.cz', 'F', '430-01-578-5',
'+7 (393) 232-1860', '1979-11-09', '2018-10-28 19:25:35'),
('Pavlov', 'Steljes', 'psteljesi@macromedia.com', 'F', '571-09-6181',
'+598 (877) 881-3236', '1991-06-24', '2020-09-18 05:34:31'),
('Darn', 'Hadeke', 'dhadekej@last.fm', 'M', '478-32-02-87',
'+370 (347) 110-4270', '1984-09-04', '2018-02-10 12:56:00'),
('Wendell', 'Spanton', 'wspantonk@de.vu', 'F', null,
'+84 (301) 762-1316', '1973-07-24', '2018-01-30 01:20:11'),
('Carlo', 'Yearby', 'cyearbyl@comcast.net', 'F', null,
'+55 (288) 623-4067', '1974-11-11', '2018-06-24 03:18:40'),
('Sheila', 'Evitts', 'sevittsm@webmd.com', null, '830-40-5287',
null, '1977-03-01', '2020-07-20 09:59:41'),
('Sianna', 'Lowdham', 'slowdhamn@stanford.edu', null, '778-0845',
null, '1985-12-23', '2018-06-29 02:42:49'),
('Phylys', 'Aslie', 'paslieo@qq.com', 'M', '368-44-4478',
'+86 (765) 152-8654', '1984-03-22', '2019-10-01 01:34:28')
Get all the number of users created per year.
Use users table for this exercise.
Output should contain 4 digit year and count.
Use date specific functions to get the year using created_ts.
Make sure you define aliases to the columns as created_year and user_count respectively.
Data should be sorted in ascending order by created_year.
When you run the query using Jupyter environment, it might have decimals for integers. Hence you can display results even with decimal points.
Here is the sample output.
created_year |
user_count |
---|---|
2018 |
13 |
2019 |
4 |
2020 |
8 |
Get the day name of the birth days for all the users born in the month of June.
Use users table for this exercise.
Output should contain user_id, user_dob, user_email_id and user_day_of_birth.
Use date specific functions to get the month using user_dob.
user_day_of_birth should be full day with first character in upper case such as Tuesday
Data should be sorted by day with in the month of May.
user_id |
user_dob |
user_email_id |
user_day_of_birth |
---|---|---|---|
4 |
1998-05-24 |
mtanswill3@dedecms.com |
Sunday |
12 |
1983-05-26 |
uglayzerb@pinterest.com |
Thursday |
1 |
1973-05-31 |
gbode0@imgur.com |
Thursday |
2 |
2003-05-31 |
lgisbey1@mail.ru |
Saturday |
Get the names and email ids of users added in year 2019.
Use users table for this exercise.
Output should contain user_id, user_name, user_email_id, created_ts, created_year.
Use date specific functions to get the year using created_ts.
user_name is a derived column by concatenating user_first_name and user_last_name with space in between.
user_name should have values in upper case.
Data should be sorted in ascending order by user_name
user_id |
user_name |
user_email_id |
created_ts |
created_year |
---|---|---|---|---|
8 |
CYBIL LISSIMORE |
clissimore7@pinterest.com |
2019-12-09 14:08:30 |
2019.0 |
25 |
PHYLYS ASLIE |
paslieo@qq.com |
2019-10-01 01:34:28 |
2019.0 |
12 |
UNA GLAYZER |
uglayzerb@pinterest.com |
2019-09-17 03:24:21 |
2019.0 |
17 |
YANKEE JELF |
yjelfg@wufoo.com |
2019-09-16 16:09:12 |
2019.0 |
Get the number of users by gender.
Use users table for this exercise.
Output should contain gender and user_count.
For males the output should display Male and for females the output should display Female.
If gender is not specified, then it should display Not Specified.
Data should be sorted in descending order by user_count.
user_gender |
user_count |
---|---|
Female |
13 |
Male |
10 |
Not Specified |
2 |
Get last 4 digits of unique ids.
Use users table for this exercise.
Output should contain user_id, user_unique_id and user_unique_id_last4
Unique ids are either null or not null.
Unique ids contain numbers and hyphens and are of different length.
We need to get last 4 digits discarding hyphens only when the number of digits are at least 9.
If unique id is null, then you should dispaly Not Specified.
After discarding hyphens, if unique id have less than 9 digits then you should display Invalid Unique Id.
Data should be sorted by user_id. You might see None or null for those user ids where there is no unique id for user_unique_id
user_id |
user_unique_id |
user_unique_id_last4 |
---|---|---|
1 |
88833-8759 |
8759 |
2 |
262501-029 |
1029 |
3 |
391-33-2823 |
2823 |
4 |
1195413-80 |
1380 |
5 |
471-24-6869 |
6869 |
6 |
192374-933 |
4933 |
7 |
749-27-47-52 |
4752 |
8 |
461-75-4198 |
4198 |
9 |
892-36-676-2 |
6762 |
10 |
197-54-1646 |
1646 |
11 |
232-55-52-58 |
5258 |
12 |
898-84-336-6 |
3366 |
13 |
247-95-68-44 |
6844 |
14 |
415-48-894-3 |
8943 |
15 |
403-39-5-869 |
5869 |
16 |
399-83-05-03 |
0503 |
17 |
607-99-0411 |
0411 |
18 |
430-01-578-5 |
5785 |
19 |
571-09-6181 |
6181 |
20 |
478-32-02-87 |
0287 |
21 |
|
Not Specified |
22 |
|
Not Specified |
23 |
830-40-5287 |
5287 |
24 |
778-0845 |
Invalid Unique Id |
25 |
368-44-4478 |
4478 |
Get the count of users based up on country code.
Use users table for this exercise.
Output should contain country code and count.
There should be no +
in the country code. It should only contain digits.
Data should be sorted as numbers by country code.
We should discard user_phone_no with null values.
Here is the desired output:
country_code |
user_count |
---|---|
1 |
1 |
7 |
2 |
48 |
1 |
54 |
1 |
55 |
1 |
62 |
3 |
63 |
1 |
81 |
1 |
84 |
1 |
86 |
4 |
229 |
1 |
249 |
1 |
351 |
1 |
370 |
1 |
380 |
1 |
420 |
1 |
598 |
1 |
Let us validate if we have invalid order_item_subtotal as part of order_items table.
order_items table have 6 fields.
order_item_id
order_item_order_id
order_item_product_id
order_item_quantity
order_item_subtotal
order_item_product_price
order_item_subtotal is nothing but product of order_item_quantity and order_item_product_price. It means order_item_subtotal is compute by multiplying order_item_quantity and order_item_product_price for each item.
You need to get the count of order_items where order_item_subtotal is not equal to the product of order_item_quantity and order_item_product_price.
There can be issues related to rounding off. Make sure it is taken care using appropriate function.
Output should be 0 as there are no such records.
count |
---|
0 |
Get number of orders placed on weekdays and weekends in the month of January 2014.
orders have 4 fields
order_id
order_date
order_customer_id
order_status
Use order date to determine the day on which orders are placed.
Output should contain 2 columns - day_type and order_count.
day_type should have 2 values Week days and Weekend days.
Here is the desired output.
day_type |
order_count |
---|---|
Weekend days |
1505 |
Week days |
4403 |