Takes function argument ,function returns boolean value, only which return true will be considered.
val orders = sc.textFile(“/Users/itversity/Research/data/retail_db/orders”)
Find orders which have status as “COMPLETE”
val ordersCompleted = orders.filter( e => e.split(“,”)(3) == “COMPLETE”)
This creates a new RDD
Find orders which have status as “COMPLETE” and “CLOSED”
val ordersCompleted = orders.filter( e => e.split(“,”)(3) == “COMPLETE” || e.split(“,”)(3) == “CLOSED”)
Find orders which are placed in july 2013-
val orders2013 = orders.filter( e => e.split(“,”)(1).startsWith( “2013-07”))
Find orders which are placed in july 2013 which have status as closed as well as completed-
val orders2013 = orders.filter( e => { e.split(“,”)(1).startsWith( “2013-07”) && e.split(“,”)(3) == “COMPLETE” || e.split(“,”)(3) == “CLOSED” }) This could give wrong results, the other way of doing this is-
val orders2013 = orders.filter( e => { val o = e.split(“,”) o(1).startsWith(“2013-07”) && (o(3) == “COMPLETE” || O(3) == “CLOSED”) })