Let us understand the execution modes as well as different components of the Spark Framework. Also, we will recap some important aspects of YARN.
Following are the different execution modes supported by Spark.
- Local (for development)
- Standalone (for development)
As our cluster uses YARN, let us recap some important aspects of YARN.
- YARN uses Master (Resource Manager) and Slave (Node Managers) Architecture.
- YARN primarily takes care of resource management and scheduling the tasks.
- For each YARN Application, there will be an application master and set of containers created to process the data.
- We can plugin different distributed frameworks into YARN, such as Map Reduce, Spark etc.
- Spark creates executors to process the data and these executors will be managed the Resource Manager and per job Application Master.
Let us understand the Spark Execution Framework by running wordcount program using ROD.
- Driver Program
- Spark Context
- Executor Cache
- Executor Tasks
- Task (Executor Tasks)