Now let us understand development life cycle using PyCharm and deployment lifecycle. As part of the deployment life cycle, we will see how to control runtime behavior.
Development Life Cycle
Let us check the development life cycle of Spark applications using PyCharm with word count and daily revenue.
Create new project
Make sure PyCharm is configured for Pyspark Python
Externalize Properties using ConfigParser
Create Spark Configuration object and Spark Context object
Develop logic to read, process and save the output back
Externalize execution mode, input base directory and output path
Validate locally using pycharm and probably spark-submit in local mode
Deployment Life Cycle
Once the code is developed, we can deploy it on the gateway node on the cluster.
Ship the folder which contain all python files to gateway node