Checkpointing contd

Process of creating new fsimage at regular intervals is known as checkpointing.

  • A new fsimage is created on namenode when it is formatted and added.
  • Changes to metadata will be logged in edit logs on Namenode.
  • Secondary Namenode takes care of checkpointing. It will take the last fsimage (from Namenode or Secondary Namenode) and latest edit logs from Namenode since the last checkpoint  and create new fsimage.
  • Once the checkpointing is done, fsimage will be copied back to Namenode (controlled by dfs.name.dir)
  • dfs.namenode.checkpoint.period (Default – 1 hour) – Maximum delay between two consecutive checkpoints
  • dfs.namenode.checkpoint.txns (Default 1 Million)- Defines the number of transactions in Editlogs which will force checkpointing
  • Either of the events will trigger checkpointing.
  • Once checkpointing is done fsimage will be copied back to Namenode directory controlled by dfs.name.dir
  • You can also perform manual checkpointing
    • On Namenode you need to first save namespace so that fsImage is created (in safe mode). We can use hdfs dfsadmin for entering into safe mode, create FSImage and then leave safe mode.
    • On Secondary Namenode you can run this command to force checkpoint – hadoop secondarynamenode -checkpoint force
    • We can also enter into safe mode and create FSImage using Cloudera Manager as depicted here. Click on HDFS on Dashboard/Home Page and then Namenode on the summary. You will see the options under Actions as highlighted below.

It is good practice to configure Namenode to store multiple copies of fsimage and edit logs. Also these multiple copies should be persisted on different hard drives. One of the hard drive should be external to the server. By doing this even if Namenode goes down completely we should be able to recover metadata onto different server

Share this post