- A file will be divided into blocks (by default 128 MB) and those blocks will be physically stored as part of servers where datanode process is running.
- For example, a 1 GB file will be divided into 8 blocks of 128 MB each and a 200 MB file will be divided into 2 blocks of size 128 MB and 72 MB respectively.
- If the file size is less than block size, then each file will be stored in 1 block of its size.
- There will be multiple copies of each block for fault tolerance. It is controlled by replication factor.
- Metadata – details about files and blocks.
- There is block metadata and file metadata
- At block level we will have mapping between file, block id, block location etc
- At file level we will have file path, file permissions etc
- Metadata is managed as part of memory in the server where Namenode is running. It is also called as Namenode heap (bigdataserver-2 in our case)
Durga Gadiraju
Copy Data into HDFS contd
- February 9, 2023
- , 12:43 pm
- , Uncategorized