Using Spark SQL and Data Frames programmatically

Here we are Developing Revenue per order using dataframe operations and for each order in order_items,we need to use order_items_subtotal and add it to get order level revenue.

  • In Spark 2.0, we need not create sqlContext and sparksession is the only entry point of the program.
  • First create spark session object-
    val spark = SparkSession
   .builder()
   .appName("SparkSQLExample")
   .master("local")
   .getOrCreate()
  • import spark.implicits._
  • Read json file-
    • val order_items = spark.read.json(“/file/path”)
  • order_items.groupBy($”order_item_order_id”).sum(“order_item_subtotal”).show