Filtering data using filter

Filter Transformations-

  • Takes function argument ,function returns boolean value, only which return true will be considered.
  • val orders = sc.textFile(“/Users/itversity/Research/data/retail_db/orders”)
  • Find orders which have status as “COMPLETE”
    • val ordersCompleted = orders.filter( e => e.split(“,”)(3) == “COMPLETE”)
    • This creates a new RDD
  • Find orders which have status as “COMPLETE” and “CLOSED”
    • val ordersCompleted = orders.filter( e => e.split(“,”)(3) == “COMPLETE” || e.split(“,”)(3) == “CLOSED”)
  • Find orders which are placed in july 2013-
    • val orders2013 = orders.filter( e => e.split(“,”)(1).startsWith( “2013-07”))
  • Find orders which are placed in july 2013 which have status as closed as well as completed-
    • val orders2013 = orders.filter( e => { e.split(“,”)(1).startsWith( “2013-07”) && e.split(“,”)(3) == “COMPLETE” || e.split(“,”)(3) == “CLOSED”  }) This could give wrong results, the other way of doing this is-
      • val orders2013 = orders.filter( e => {  val o = e.split(“,”) o(1).startsWith(“2013-07”) && (o(3) == “COMPLETE”  || O(3) == “CLOSED”)  })