How to convert RDD into dataframe-
RDD doesn’t have any structure, so we need to apply structure to it.
Method 1-Using a Case class
- Create an RDD-
- val orders = spark.sparkContext.textFile(“/Users/itversity/Research/data/retail_db/orders”)
- Preview the data-
- orders.take(10).foreach(println)
- Create case class-
- case class Order{ order_id:Int, order_date:String, order_customer_id:Int, order_status:String}
- Create a dataframe-
- val ordersDF = orders.map( o=>{val a =o.split(“,”)
Order(a[0].toInt, a[1],a[2].toInt, a[3] ) }).toDF()
Method 2:-Explicitly giving field name to toDF method
- val ordersDF = orders.map( o=>{val a =o.split(“,”)
- (a[0].toInt, a[1],a[2].toInt, a[3] ) }).toDF(“order_id”,”order_date”,”order_customer_id”,”order_status”)