Get Top N Products Per Day – Create application and read data

Dataframe operations in Detail-

  • First Create spark session object ,import necessary classes and do the following-
    • val spark = SparkSession.builder.appName(“Get Top N per day”).master(args[0]).getOrCreate
    • import spark.implicits._   so that we can get necessary APIs
    • Set log level-   spark.sparkContext.setLogLevel(“ERROR”)
    • Set shuffle operations – spark.conf.set(“spark.sql.shuffle.operations”, “2”)
    • val inputBaseDir = args[1]
  • Now create a Dataframe from json-
    • val ordersDF = spark.read.json(inputBaseDir+”/orders”)
    • ordersDF.show
  • Now go to Run, edit configurations and  enter arguments as- local /Users/itversity/Reasearch/data/retail_db_json
  • Run the class GetTopNProductsPerDay
  • Similarly create dataframes for order items and products , orderItemsDF and productsDF and preview both the datasets using show.