Now let us talk about YARN Application Life Cycle. YARN is the resource management framework.
- We can use distributed data processing frameworks such as Map Reduce, Spark etc., by plugging into YARN.
- A YARN application can be Map Reduce Job or Spark Application.
- From YARN perspective data is being processed by containers.
- Let us understand the life cycle of YARN Application.
- We use the client to submit YARN Application (for e. g.: Map Reduce Job)
- The request will go to the Resource Manager. Resource Manager has up to date information about the usage of all the servers on registered Node Managers running on servers.
- Resource Manager will decide a node on which container should run to manage the job or application using different criteria such as usage of the servers.
- This container is called as Application Master. It will be up and running until the application is either completed or killed.
- Now Application Master will talk to Node Managers directly and decide on which nodes containers should run to process the data. It uses Data Locality and Server Usage as criteria before creating containers.
- These containers will process the data and might get garbage collected depending upon the underlying data processing framework.