Create Data Frame – Applying Schema programmatically using types

Let us see how to apply schema programmatically

  • Create the schema string-
    • val s = “order_id:Int order_date:String order_customer_id:Int order_status:String”
  • Create the schema based on the string of schema.
  • import org.apache.spark.sql.types._
  • Use StructField and structType

val fields = s.split(" ").map( e=>{
val f = e.split(":")(0)     // f is the field name
val t = if(e.split(":")(1) == "Int") IntegerType else StringType     //t is type
StructField(f, t, nullable = false)})

  • val schema = StructType(fields)
  • import org.apache.spark.sql._
  • Create Row RDD-

 val ordersRDD =       spark.sparkContext.textFile("/Users/itversity/Reasearch/data/r  etail_db/orders").
map(e=> { val a = e.split(",")
Row(a(0).toInt,a(1),a(2).Int,a(3))

  • Create dataframe from RDD of type row-
    val ordersDF = spark.createDataframe(ordersRDD,schema)