Develop first spark application and building Jar file using sbt

Develop first spark application – Get revenue for each order from order_items


In this topic, we will know about developing Spark Application.

  • Create a package name retail_db, and create an object named as  “GetRevenuePerOrder”.
  • Define the main function and import Spark based applications like this.
     import org.apache.spark.{SparkConf, SparkContext}
  • Go to src/main/scala
  • Right click and click on New -> Package
  • Give the package name as retail_db
  • Right click on retail_db and click on New -> Scala Class
    • Name: GetRevenuePerOrder
    • Type: Object
  • Replace the code with this code snippet
package retail_db
import org.apache.spark.{SparkConf, SparkContext}
object GetRevenuePerOrder {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().
      setMaster(args(0)).
      setAppName("Get revenue per order")
    val sc = new SparkContext(conf)
    sc.setLogLevel("ERROR")

    val orderItems = sc.textFile(args(1))
    val revenuePerOrder = orderItems.
      map(oi => (oi.split(",")(1).toInt, oi.split(",")(4).toFloat)).
      reduceByKey(_ + _).
      map(oi => oi._1 + "," + oi._2)

    revenuePerOrder.saveAsTextFile(args(2))
  }

}
  • Program takes 3 arguments
    • args(0) -> execution mode
    • args(1) -> input path
    • args(2) -> output path
  • Running the application
    • Go to Run menu -> Edit Configurations
    • Add new application
    • Give application name GetRevenuePerOrder
    • Choose main class: retail_db.GetRevenuePerOrder
    • Program arguments: local <input_path> <output_path>
    • Use classpath for module: Choose spark2demo
    • Click on Apply and then Ok
  • Now you can run the application by right-clicking and choosing Run “GetRevenuePerOrder”
  • Go to output path and check files are created for output or not

Build Jar file using sbt


In the above video, we will see how to build jar file using SBT

  • We have to build the jar file and validate the jar file on our local PC before running on the cluster
  • To build a jar file we can use SBT and to validate we use Spark-Submit.
  • Copy the path by right-clicking the project in IntelliJ
  • Go to command prompt and cd to the path
  • Check the directory structure, you should see
    • src directory
    • build.sbt
  • Run sbt package
  • It will build jar file and you will see the path
  • It will be typically <project_directory>/target/scala-2.11/spark2demo_2.11-0.1.jar
  • We can also run using sbt “run-main”
sbt "run-main retail_db.GetRevenuePerOrder local <input_path> <output_path>"