Create Data Frame – Infer Schema using Reflection

How to convert RDD into dataframe-

RDD doesn’t have any structure, so we need to apply structure to it.

Method 1-Using a Case class

  • Create an RDD-
    • val orders = spark.sparkContext.textFile(“/Users/itversity/Research/data/retail_db/orders”)
  • Preview the data-
    • orders.take(10).foreach(println)
  • Create case class-
    • case class Order{ order_id:Int, order_date:String, order_customer_id:Int, order_status:String}
  • Create a dataframe-
    • val ordersDF = orders.map( o=>{val a =o.split(“,”)

Order(a[0].toInt, a[1],a[2].toInt, a[3] ) }).toDF()

Method 2:-Explicitly giving field name to toDF method

  • val ordersDF = orders.map( o=>{val a =o.split(“,”)
  •                              (a[0].toInt, a[1],a[2].toInt, a[3] ) }).toDF(“order_id”,”order_date”,”order_customer_id”,”order_status”)