Creating Data Set – using Data Frame

How to create Dataset using Dataframe?

  • With Spark 2.0, sparksession is the single entry point of the program.
  • Read CSV file using sparksession object,this will create a dataframe-
    • val orders = spark.read.csv(“/file/path”)
  • Create case class Order with field names matching the dataframes.
  • Create a dataset by mapping it to case class-
    • val orders = spark.read.csv(“/file/path”).as[Order]