Initialize a Spark application

Run Spark job using spark-shell

Using spark-shell we can validate ad hoc code to confirm it is working. It will also confirm whether the installation is successful or not.

  • Run spark-shell
  • Execute this code and make sure it return results
val orderItems = sc.textFile("C:\\data\\retail_db\\order_items")
val revenuePerOrder = orderItems.
  map(oi => (oi.split(",")(1).toInt, oi.split(",")(4).toFloat)).
  reduceByKey(_ + _).
  map(oi => oi._1 + "," + oi._2)
revenuePerOrder.take(10).foreach(println)

On Windows after showing the output, it might throw the exception.

Run Spark application using Spark submit

We can validate the jar file by using spark-submit

  • spark-submit is the main command to submit the job
  • --class retail_db.GetRevenuePerOrder, to pass the class name
  • By default master is local, if you want to override we can use --master
  • After spark-submit and control arguments we have to give jar file name followed by arguments
spark-submit --class retail_db.GetRevenuePerOrder <PATH_TO_JAR> local <INPUT_PATH> <OUTPUT_PATH>