Performing Total Aggregations¶
We have pre-existing functions to take care of aggregations such as len
, sum
, min
, max
etc. Let us understand how they are typically implemented.
- Get total number of records for a given month using orders.
- Generate total revenue for a given order id using order items.
- Use order items data set and get total number of items sold as well as total revenue generated for a given product_id.
- Create a collection with sales and commission percentage. Using that collection compute total commission amount. If the commission percent is None or not present, treat it as 0.
In [1]:
%run 07_preparing_data_sets.ipynb
In [2]:
orders[:10]
Out[2]:
['1,2013-07-25 00:00:00.0,11599,CLOSED', '2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT', '3,2013-07-25 00:00:00.0,12111,COMPLETE', '4,2013-07-25 00:00:00.0,8827,CLOSED', '5,2013-07-25 00:00:00.0,11318,COMPLETE', '6,2013-07-25 00:00:00.0,7130,COMPLETE', '7,2013-07-25 00:00:00.0,4530,COMPLETE', '8,2013-07-25 00:00:00.0,2911,PROCESSING', '9,2013-07-25 00:00:00.0,5657,PENDING_PAYMENT', '10,2013-07-25 00:00:00.0,5648,PENDING_PAYMENT']
In [3]:
len(orders)
Out[3]:
68883
In [4]:
order_items[:10]
Out[4]:
['1,1,957,1,299.98,299.98', '2,2,1073,1,199.99,199.99', '3,2,502,5,250.0,50.0', '4,2,403,1,129.99,129.99', '5,4,897,2,49.98,24.99', '6,4,365,5,299.95,59.99', '7,4,502,3,150.0,50.0', '8,4,1014,4,199.92,49.98', '9,5,957,1,299.98,299.98', '10,5,365,5,299.95,59.99']
In [5]:
len(order_items)
Out[5]:
172198
Task 1¶
Use orders and get total number of records for a given month (201401).
- Develop a function which take orders collection and month as arguments.
- Month will be passed as integer in the form of yyyyMM (example 201401).
- Return the order count
In [6]:
orders[:10]
Out[6]:
['1,2013-07-25 00:00:00.0,11599,CLOSED', '2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT', '3,2013-07-25 00:00:00.0,12111,COMPLETE', '4,2013-07-25 00:00:00.0,8827,CLOSED', '5,2013-07-25 00:00:00.0,11318,COMPLETE', '6,2013-07-25 00:00:00.0,7130,COMPLETE', '7,2013-07-25 00:00:00.0,4530,COMPLETE', '8,2013-07-25 00:00:00.0,2911,PROCESSING', '9,2013-07-25 00:00:00.0,5657,PENDING_PAYMENT', '10,2013-07-25 00:00:00.0,5648,PENDING_PAYMENT']
In [7]:
order = '1,2013-07-25 00:00:00.0,11599,CLOSED'
In [8]:
order.split(',')
Out[8]:
['1', '2013-07-25 00:00:00.0', '11599', 'CLOSED']
In [9]:
order.split(',')[1]
Out[9]:
'2013-07-25 00:00:00.0'
In [10]:
order.split(',')[1][:7]
Out[10]:
'2013-07'
In [11]:
order.split(',')[1][:7].replace('-', '')
Out[11]:
'201307'
In [12]:
int(order.split(',')[1][:7].replace('-', ''))
Out[12]:
201307
In [13]:
def get_order_count(orders, order_month):
order_count = 0
for order in orders:
l_order_month = int(order.split(',')[1][:7].replace('-', ''))
if l_order_month == order_month: order_count += 1
return order_count
In [14]:
get_order_count(orders, 201401)
Out[14]:
5908
Task 2¶
Use order items data set and compute total revenue generated for a given product_id.
- Define a function which takes order_items and a product_id as arguments.
- product_id will be passed as integer
- Compute revenue generated for a given product id using subtotal (5th field)
- Return the computed product revenue
In [15]:
order_items[:10]
Out[15]:
['1,1,957,1,299.98,299.98', '2,2,1073,1,199.99,199.99', '3,2,502,5,250.0,50.0', '4,2,403,1,129.99,129.99', '5,4,897,2,49.98,24.99', '6,4,365,5,299.95,59.99', '7,4,502,3,150.0,50.0', '8,4,1014,4,199.92,49.98', '9,5,957,1,299.98,299.98', '10,5,365,5,299.95,59.99']
In [16]:
order_item = '1,1,957,1,299.98,299.98'
In [17]:
order_item.split(',')
Out[17]:
['1', '1', '957', '1', '299.98', '299.98']
In [18]:
order_item.split(',')[2]
Out[18]:
'957'
In [19]:
int(order_item.split(',')[2])
Out[19]:
957
In [20]:
float(order_item.split(',')[4])
Out[20]:
299.98
In [21]:
def get_product_revenue(order_items, product_id):
product_revenue = 0.0
for order_item in order_items:
l_product_id = int(order_item.split(',')[2])
order_item_subtotal = float(order_item.split(',')[4])
if l_product_id == product_id: product_revenue += order_item_subtotal
return product_revenue
In [22]:
get_product_revenue(order_items, 502)
Out[22]:
3147800.0
Task 3¶
Use order items data set and get total number of items sold as well as total revenue generated for a given product_id.
- Define a function which takes order_items and a product_id as arguments.
- product_id will be passed as integer
- Get number of items sold for a given product id using quantity (4th field)
- Compute revenue generated for a given product id using subtotal (5th field)
- Return the number of items sold as well as revenue generated. When we return more than one value, then the type of the returned results will be tuple.
In [23]:
t1 = (1, 200.0)
In [24]:
t2 = (2, 300.0)
In [25]:
res = (0, 0.0)
In [26]:
res = (res[0] + t1[0], res[1] + t1[1])
In [27]:
res
Out[27]:
(1, 200.0)
In [28]:
res = (res[0] + t2[0], res[1] + t2[1])
In [29]:
res
Out[29]:
(3, 500.0)
In [30]:
def get_product_metrics(order_items, product_id):
product_metrics = (0, 0.0)
for order_item in order_items:
l_product_id = int(order_item.split(',')[2])
order_metric = (int(order_item.split(',')[3]), float(order_item.split(',')[4]))
if l_product_id == product_id:
product_metrics = (product_metrics[0] + order_metric[0], product_metrics[1] + order_metric[1])
return product_metrics
In [31]:
get_product_metrics(order_items, 502)
Out[31]:
(62956, 3147800.0)
{note}
Alternative approach to get the product metrics. We can return multiple values as part of the return statement.
In [32]:
def get_product_metrics(order_items, product_id):
product_count, product_revenue = 0, 0.0
for order_item in order_items:
l_product_id = int(order_item.split(',')[2])
order_metric = (int(order_item.split(',')[3]), float(order_item.split(',')[4]))
if l_product_id == product_id:
product_count += order_metric[0]
product_revenue += order_metric[1]
return product_count, product_revenue
In [33]:
get_product_metrics(order_items, 502)
Out[33]:
(62956, 3147800.0)
In [34]:
type(get_product_metrics(order_items, 502))
Out[34]:
tuple
Task 4¶
Create a collection with sales and commission percentage. Using that collection compute total commission amount. If the commission percent is None or not present, treat it as 0.
- Each element in the collection should be a tuple.
- First element is the sales amount and second element is commission percentage.
- Commission for each sale can be computed by multiplying commission percentage with sales (make sure to divide commission percentage by 100).
- Some of the records does not have commission percentage, in that case commission amount for that sale shall be 0
- Function should take a collection of tuples and return commission amount which is of type float.
In [35]:
transactions = [(376.0, 8),
(548.23, 14),
(107.93, 8),
(838.22, 14),
(846.85, 21),
(234.84,),
(850.2, 21),
(992.2, 21),
(267.01,),
(958.91, 21),
(412.59,),
(283.14,),
(350.01, 14),
(226.95,),
(132.7, 14)]
In [36]:
type(transactions)
Out[36]:
list
In [37]:
transactions[:6]
Out[37]:
[(376.0, 8), (548.23, 14), (107.93, 8), (838.22, 14), (846.85, 21), (234.84,)]
In [38]:
sale = transactions[0]
In [39]:
type(sale)
Out[39]:
tuple
In [40]:
commission_amount = round(sale[0] * (sale[1] / 100), 2)
In [41]:
commission_amount
Out[41]:
30.08
In [42]:
sale = (234.84,)
In [43]:
commission_amount = round(sale[0] * (sale[1] / 100), 2) # errors out
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) Input In [43], in <cell line: 1>() ----> 1 commission_amount = round(sale[0] * (sale[1] / 100), 2) IndexError: tuple index out of range
In [44]:
len(sale)
Out[44]:
1
In [45]:
commission_pct = sale[1] / 100 if len(sale) == 2 else 0
In [46]:
commission_pct
Out[46]:
0
In [47]:
def get_commission_amount(sales):
commission_amount = 0.0
for sale in sales:
sale_amount = sale[0]
commission_pct = round(sale[1]/100, 2) if len(sale) == 2 else 0
commission_amount += sale_amount * commission_pct
return round(commission_amount, 2)
In [48]:
get_commission_amount(transactions)
Out[48]:
1066.51