Now let us explore different components related to YARN as well as Map Reduce 2 and how they are used in Resource Management as well as processing data.
YARN stands for Yet Another Resource Negotiator. It provides capabilities related to Resource Management and actual data processing is done by frameworks such as Map Reduce, Spark, Tez etc.
- We need to configure Resource Manager, Node Managers, Application Timeline (history) Server, Map Reduce Job History Server etc as part of YARN. Here Resource Manager act as master and whereas Node Managers act as slaves.
- Application Timeline Server and Map Reduce Job History Server is primarily to get details about completed applications or map reduce jobs.
- Actual data will be processed in Node Managers in the form of containers while Resource Manager manages the resources at the cluster level. For Map Reduce jobs, they are termed as Map Tasks and Reduce Tasks and in case of Spark they are termed as Spark Executors.
- Node Managers send heartbeat to Resource Manager and Resource Manager keep track of resources at the cluster level.
- As part of the heartbeat, Node Managers also send resource utilization to Resource Manager, so that Resource Manager keeps track of usage of Node Managers. It will facilitate Resource Manager for the effective usage of the cluster for new jobs.