Dataframe operations in Detail-
- First Create spark session object ,import necessary classes and do the following-
- val spark = SparkSession.builder.appName(“Get Top N per day”).master(args[0]).getOrCreate
- import spark.implicits._ so that we can get necessary APIs
- Set log level- spark.sparkContext.setLogLevel(“ERROR”)
- Set shuffle operations – spark.conf.set(“spark.sql.shuffle.operations”, “2”)
- val inputBaseDir = args[1]
- Now create a Dataframe from json-
- val ordersDF = spark.read.json(inputBaseDir+”/orders”)
- ordersDF.show
- Now go to Run, edit configurations and enter arguments as- local /Users/itversity/Reasearch/data/retail_db_json
- Run the class GetTopNProductsPerDay
- Similarly create dataframes for order items and products , orderItemsDF and productsDF and preview both the datasets using show.