HDFS Namenode HA – Components

Let us see different components related to Namenode High Availability. You can refer to this nicely written article on Namenode HA as part of Hadoop 2.x.

  • Zookeeper – We already have Zookeeper running on our cluster. It will be used as part of Namenode High Availability.
  • Quorum Based Storage – Active Namenode writes to it and Standby Namenode read from it. It is managed by Quorum Journal Manager and the directories are known as Journal Directories.
  • Journal Directories
  • Active and Standby Namenode (from this article)
    • The Active Namenode is responsible for all client operations in the cluster.
    • The Standby NameNode maintains enough state to provide a fast failover.
    • In order for the Standby node to keep its state synchronized with the Active node, both nodes communicate through a group of separate daemons called JournalNodes.
    • The file system journal logged by the Active Namenode at the JournalNodes is consumed by the Standby NameNode to keep it’s file system namespace in sync with the Active.
    • In order to provide a fast failover, it is also necessary that the Standby node have up-to-date information of the location of blocks in your cluster.
    • DataNodes are configured with the location of both the Namenode and send block location information and heartbeats to both Namenode machines.
  • Zookeeper Failover Controller (one per each Namenode)

Share this post