Components of HDFS contd

Anatomy of File Write in HDFS

Anatomy of File Read in HDFS

Features of HDFS

  • Fault tolerant – HDFS uses mirroring at block level and dfs.replication controls how many copies should be made.
    • Traditionally we use RAID for fault tolerance of Hard Drive failures in Network Storage.
    • In Hadoop replication takes care of not only Hard Drive failures but also any other hardware failure which might result in server crash or planned maintenance.
    • By default if there are m nodes and n is replication factor where m > n, the cluster survive n-1 node failures.
  • Logical Distributed File System – The files are divided into blocks based upon the dfs.blocksize and stored in the servers designated as data nodes.
  • Rack Awareness
    • Replication Factor can cover failure of n-1 nodes for any serious reason at individual server level.
    • We use redundant network to cover any network cable related issues.
    • However, if there is network switch failure or complete rack failure where servers are hosted – then there will be outage at the cluster level.
    • To overcome this we can configure Rack Awareness script. For this servers need to be configured on multiple racks behind 2 network switches.
    • We need to come up with strategy and then rack awareness script in such a way that at least one copy will be made in each of the racks in a multi rack Hadoop cluster behind multiple network switches (typically 2 or more).
    • We will see how to configure rack awareness script at a later point in time.

Share this post