Simple Application

Let us start with a simple application to understand details related to architecture using pyspark.

  • As we have multiple versions of Spark on our lab and we are exploring Spark 2 we need to export SPARK_MAIOR_VERSION with 2.
  • For this demo, we will disable dynamic allocation by setting spark. dynamicAllocation.enabled to false.
  • Launch pyspark using YARN and disabling dynamic allocation ( also, use spark.ui.port as well to specify a unique port).
  • Develop a simple word count program by reading data from /public/randomtextwriter/part-m-00000
  • Save output to /user/training.
Data = sc.textFile(' /public/randomtextwriter/part-m-00000')
wc = data.\
flatMap(lambda line: line.split( ' ')). \
    map( lambda word: (word, 1)). \
    reduceByKey(lambda x, y: x + y)
wc. \
    map(lambda rec: rec[0] + ',' + str(rec[1])). \
from. pyspark.sql.functions import split. explode
Data ='/public/randomtextwriter/part-m-00000')
Wc =, ' ')).alias('words')). \
groupBy('words'). \

Share this post