HDFS Namenode HA – Quick Recap of HDFS typical Configuration

Now let us understand key concepts with respect to HDFS Namenode High Availability. Let us recap HDFS components before we get into HA.

  • HDFS have Namenode, Secondary Namenode and Datanodes
  • When data is saved in the form of files in HDFS
    • Files will be divided into blocks and blocks are saved in Datanodes
    • Mapping between File, block id or name and block location is called as metadata.
    • This metadata will be stored in memory as well as edit logs of Namenode.
    • To control the size of edit logs over period of time, periodically snapshots are taken and they are called as FSImage (using edit logs and last FSImage).
    • This process of creating FSImage using last FSImage and edit logs is called as checkpointing.
  • If Namenode is down, it will take several minutes to restore and recover fsimage and edit logs to rebuild metadata in memory. During the recovery process entire cluster is not usable.
  • Also if there is hard ware failures, migrating Namenode to other node is also time consuming. It also involves in changing Namenode URI in multiple locations.
  • We can overcome these issues by configuring High Availability on Namenode.
  • In HA configuration, instead of having Namenode and Secondary Namenode we will have Active and Passive Namenode. Hence, it is also called as Active-Passive Configuration. Here are the issues HA addresses.
    • Manual involvement in case of an unplanned outage
    • Planned upgrades where Namenode need to be brought down
  • HA Configuration provides us transparent and fast failover of the Namenode.

Share this post