Data Frames Operations – selection or projection of data

Selection and Projection of Data-

First go to spark-shell-

  • Create a base directory variable-
    • val inputBaseDir = “/Users/itversity/research/data/retail_db_json”
  • Create Dataframe and preview the data.Also print the schema.
    • val ordersDF = + “/orders”)
    • ordersDF.printSchema
    •   (shows top 100 records)
    • ordersDF.count    (Counts the number of records)
  • Use select for projecting order_id and order_date-
    • Select function takes either column type or string type
      •“order_id”,”order_date”).show   //string type
      •$”order_id”,$”order_date”).show  //column type
  • To apply length function-
  • To give an alias to column name-
  • To get unique elements-


Note – All functions can be seen using import org.apache.spark.sql.functions. and then hit tab