Copy Data into HDFS contd

  • A file will be divided into blocks (by default 128 MB) and those blocks will be physically stored as part of servers where datanode process is running.
  • For example, a 1 GB file will be divided into 8 blocks of 128 MB each and a 200 MB file will be divided into 2 blocks of size 128 MB and 72 MB respectively.
  • If the file size is less than block size, then each file will be stored in 1 block of its size.
  • There will be multiple copies of each block for fault tolerance. It is controlled by replication factor.
  • Metadata – details about files and blocks.
    • There is block metadata and file metadata
    • At block level we will have mapping between file, block id, block location etc
    • At file level we will have file path, file permissions etc
    • Metadata is managed as part of memory in the server where Namenode is running. It is also called as Namenode heap (bigdataserver-2 in our case)

Share this post