Processing Collections using Map Reduce APIs

As part of this class we have covered

  • Operations on Set
  • Understanding reduce
  • Aggregate functions such as sum, min, max etc
  • Reiterated on groupBy
  • Sorting data using sorted and sortBy

myReduce using loops

Sorting Data using sorted

  • sorted will sort the data in natural order of the elements in the collection
  • Element type in the collection should have implicit function with Ordering
val l = List(1, 2, 5, 6, 2, 3, 1)
l.sorted

Sorting Data using sortBy

Problem Statement: Sort data by order customer id (3rd field in orders data)

Exercises

  • Sort Data by product price in descending order
    • Location: /data/retail_db/products/part-00000
    • Price is 5th element in the data
    • Filter out the record with product_id 685
  • Sort Data by product category id in ascending order
    • Location: /data/retail_db/products/part-00000
    • Category id is second element
  • Sort Data in ascending order by category id and descending order by product price
    • Location: /data/retail_db/products/part-00000
    • Category is second element and Product Price is 5th element
    • Filter out the record with product_id 685
  • Compute order revenue for each order id and sort data in descending order by order revenue
    • Location: /data/retail_db/order_items/part-00000
    • Order id is second element and order item subtotal is 5th element
    • First compute revenue for each order id and then sort the data in descending order by revenue
    • Output should have only order_id and computed revenue