Day: March 24, 2022

Preparing Data Sets for Understanding Map-Reduce Libraries

Preparing Data Sets¶ We will be primarily using orders and order_items data set to understand about manipulating collections. orders is available at path /data/retail_db/orders/part-00000 order_items is available at path /data/retail_db/order_items/part-00000 orders – columns order_id – it is of type integer and unique order_date – it can be considered as string order_customer_id – it is of …

Preparing Data Sets for Understanding Map-Reduce Libraries Read More »

Validating MyReduceByKey Function

Validate myReduceByKey Function¶ Let us perform few tasks to validate myReduceKey Function. In [1]: %run 04_develop_myMap_function.ipynb In [2]: %run 08_develop_myReduceByKey_function.ipynb Use the function to get the count by date from orders. In [3]: orders_path = “/data/retail_db/orders/part-00000” In [4]: orders = open(orders_path). \ read(). \ splitlines() In [5]: orders[:10] Out[5]: [‘1,2013-07-25 00:00:00.0,11599,CLOSED’, ‘2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT’, ‘3,2013-07-25 00:00:00.0,12111,COMPLETE’, ‘4,2013-07-25 00:00:00.0,8827,CLOSED’, ‘5,2013-07-25 00:00:00.0,11318,COMPLETE’, ‘6,2013-07-25 …

Validating MyReduceByKey Function Read More »