Develop first spark application – Get revenue for each order from order_items
In this topic, we will know about developing Spark Application.
- Create a package name retail_db, and create an object named as “GetRevenuePerOrder”.
- Define the main function and import Spark based applications like this.
import org.apache.spark.{SparkConf, SparkContext}
- Go to src/main/scala
- Right click and click on New -> Package
- Give the package name as retail_db
- Right click on retail_db and click on New -> Scala Class
- Name: GetRevenuePerOrder
- Type: Object
- Replace the code with this code snippet
package retail_db
import org.apache.spark.{SparkConf, SparkContext}
object GetRevenuePerOrder {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().
setAppName("Get revenue per order")
val sc = new SparkContext(conf)
val orderItems = sc.textFile(args(1))
val revenuePerOrder = orderItems.
map(oi => (oi.split(",")(1).toInt, oi.split(",")(4).toFloat)).
reduceByKey(_ + _).
map(oi => oi._1 + "," + oi._2)
- Program takes 3 arguments
- args(0) -> execution mode
- args(1) -> input path
- args(2) -> output path
- Running the application
- Go to Run menu -> Edit Configurations
- Add new application
- Give application name GetRevenuePerOrder
- Choose main class: retail_db.GetRevenuePerOrder
- Program arguments: local <input_path> <output_path>
- Use classpath for module: Choose spark2demo
- Click on Apply and then Ok
- Now you can run the application by right-clicking and choosing Run “GetRevenuePerOrder”
- Go to output path and check files are created for output or not
Build Jar file using sbt
In the above video, we will see how to build jar file using SBT
- We have to build the jar file and validate the jar file on our local PC before running on the cluster
- To build a jar file we can use SBT and to validate we use Spark-Submit.
- Copy the path by right-clicking the project in IntelliJ
- Go to command prompt and cd to the path
- Check the directory structure, you should see
- Run
sbt package
- It will build jar file and you will see the path
- It will be typically <project_directory>/target/scala-2.11/spark2demo_2.11-0.1.jar
- We can also run using sbt “run-main”
sbt "run-main retail_db.GetRevenuePerOrder local <input_path> <output_path>"