Writing Data Frame to File System

Private: AWS EMR and Spark 2 using Scala Data Frame and Data Set Operations Writing Data Frame to File System

val revenuePerOrder =order_items.groupBy($”order_item_order_id”).sum(“order_item_subtotal”)
Write output to a a file system like CSV,Parquet etc.
- revenuePerOrder.write.json(args[2])
By default Spark SQL uses spark.sql.shuffle.patitions number of partitions for aggregations and joins, i.e. 200 by default.So,in case of smaller dataset,we need to change the number of tasks to 2-
- spark.conf.set(“spark.sql.shuffle.patitions”, “2”)