Let us see high level overview about High Availability for HDFS as well as YARN.
- Namenode is master for HDFS and Resource Manager is master for YARN.
- Both Namenode, as well as Resource Manager, are a single point of failure.
- If Resource Manager is down, no data processing can be done in the cluster. None of the frameworks such as Map Reduce, Spark etc can be used.
- If Namenode is down, almost everything is unusable. As we cannot access the file system, our data processing jobs also will not run even though there are no issues with respect to the services related to data processing.
- We need Zookeeper to configure High Availability as well as Failover. Let us see how Failover happens when there is no High Availability for Namenode.
- We need to start Namenode in safe mode
- FSImage have to be restored and Editlogs have to be replayed since last checkpoint.
- If there are hardware failures on Namenode, we will build a new server as Namenode. If there is a change in IP Address or DNS Alias we need to update properties files and deploy on all the nodes.
- All these steps are tedious, time-consuming and error-prone.
- High Availability not only speeds up restore and recovery process but also make failover transparent.
- To facilitate transparent failover, IP Addresses cannot be used directly. They are mapped as part of Namespaces.
- As part of HA Configuration for Namenode, we will have Active and Passive Namenodes rather than Secondary Namenode
- After configuring High Availability, Zookeeper can ensure that traffic is automatically failed over to the other node.
- We can configure High Availability for other services as well. But Namenode is the most critical.