We will follow the same standard process to learn while adding any software-based service.
- Downloading and Installing – already taken care of as part of adding hosts to the cluster.
- Configuration – we need to understand architecture and plan for the configuration.
- Architecture – Master, Helper, and Slaves
- Components – Namenode, Secondary Namenode, and Datanodes
- Configuration Files – /etc/hadoop/conf
- With cloudera the location is a bit different and we will see it after setting up the service.
- Service logs – /var/log/hadoop-hdfs
- Service Data – Different locations for different components of HDFS controlled by different properties such as dfs.data.dir, dfs.name.dir etc
- hdfs-site.xml will have parameters that are used by HDFS
- dfs.blocksize
- dfs.replication
- dfs.client.read.shortcircuit
- dfs.namenode.http-address
- dfs.datanode.http.address
- dfs.datanode.data.dir