Let us see different components related to Namenode High Availability. You can refer to this nicely written article on Namenode HA as part of Hadoop 2.x.
- Zookeeper – We already have Zookeeper running on our cluster. It will be used as part of Namenode High Availability.
- Quorum Based Storage – Active Namenode writes to it and Standby Namenode read from it. It is managed by Quorum Journal Manager and the directories are known as Journal Directories.
- Journal Directories
- Active and Standby Namenode (from this article)
- The Active Namenode is responsible for all client operations in the cluster.
- The Standby NameNode maintains enough state to provide a fast failover.
- In order for the Standby node to keep its state synchronized with the Active node, both nodes communicate through a group of separate daemons called JournalNodes.
- The file system journal logged by the Active Namenode at the JournalNodes is consumed by the Standby NameNode to keep it’s file system namespace in sync with the Active.
- In order to provide a fast failover, it is also necessary that the Standby node have up-to-date information of the location of blocks in your cluster.
- DataNodes are configured with the location of both the Namenode and send block location information and heartbeats to both Namenode machines.
- Zookeeper Failover Controller (one per each Namenode)