Writing Data Frame to File System

  • val revenuePerOrder =order_items.groupBy($”order_item_order_id”).sum(“order_item_subtotal”)
  • Write output to a a file system like CSV,Parquet etc.
    • revenuePerOrder.write.json(args[2])
  • By default Spark SQL uses spark.sql.shuffle.patitions  number of partitions for aggregations and joins, i.e. 200 by default.So,in case of smaller dataset,we need to change the number of tasks to 2-
    • spark.conf.set(“spark.sql.shuffle.patitions”, “2”)