Checkpointing and Namenode Recovery

At times Namenode might be down due to planned maintenance or unplanned outages. As administrators it is very important for us to understand recovery process of Namenode.

  • Let us start with Namenode metadata (information about files and blocks).
    • There is metadata related to blocks as well as files.
    • At block level we will have mapping between file, block id, block location etc
    • At file level we will have file path, file permissions etc
    • Metadata is managed as part of memory in the server where Namenode is running. It is also called as Namenode heap (bigdataserver-2 in our case)
    • Changes to metadata is also logged into editlogs and periodic snapshots of metadata is used to create fsimage.
    • dfs.name.dir/dfs.namenode.name.dir and dfs.namenode.checkpoint.dir are the properties which control the location of fsimage and edit logs on Namenode and Secondary Namenode respectively.
  • edit logs – It contains sequence of changes made to the filesystem by all the client applications since last checkpoint. As part of block metadata, only files and associated block ids are captured (not block locations)
  • fsimage – It is periodic snapshot of the filesystem (metadata) since namenode is started. At regular intervals new fsimage will be created by merging edit logs since last fsimage into last fsimage.
  • Only few edit logs and couple of fsimages will be retained and older ones will be deleted.

Share this post