val revenuePerOrder =order_items.groupBy($”order_item_order_id”).sum(“order_item_subtotal”)
Write output to a a file system like CSV,Parquet etc.
revenuePerOrder.write.json(args[2])
By default Spark SQL uses spark.sql.shuffle.patitions number of partitions for aggregations and joins, i.e. 200 by default.So,in case of smaller dataset,we need to change the number of tasks to 2-