Using flatMap to process data read by wholeTextFiles

First go to NYSE data in the lab and check the total file size using du -sh .
Go to spark-shell and read this data-
- val nyse = sc.wholeTextFiles(“/public/nyse”,4) .
- This creates an RDD which contains tuple with one field as name of the file and other field as content of the file.
- nyse.first._2 gives the size of the file
Actual data should be typically represented as Strings in an RDD, so we need to use flatMap in this case-
- - val nyseRDD = nyse.flatMap( e=>e._2.split(“\\r?\\n”))
  - With flatMap array is flattened
  - We can apply transformation logic along with flatMap using scala collection APIs.-
    - val nyseRDD = nyse.flatMap( e=> { val a = e._2.split("\\r?\\n"))
val t= a.map(r =>{ val rec = r.split(",")((rec(1).toInt,rec(0)), rec(6).toLong) t })