Let us start with a simple application to understand details related to architecture using pyspark.
- As we have multiple versions of Spark on our lab and we are exploring Spark 2 we need to export SPARK_MAIOR_VERSION with 2.
- For this demo, we will disable dynamic allocation by setting spark. dynamicAllocation.enabled to false.
- Launch pyspark using YARN and disabling dynamic allocation ( also, use spark.ui.port as well to specify a unique port).
- Develop a simple word count program by reading data from /public/randomtextwriter/part-m-00000
- Save output to /user/training.
Data = sc.textFile(' /public/randomtextwriter/part-m-00000') |
from. pyspark.sql.functions import split. explode |