As part of this section we will see how to set up HDFS components such as Namenode, Secondary Namenode, Datanodes etc while exploring some of the key concepts of this very important service.
- Setup HDFS
- Copy Data into HDFS
- Components of HDFS
- Configuration Files and Important Properties
- Review Web UIs and log files
- Checkpointing and Namenode Recovery
- Configure Rack Awareness
Cluster Topology
We are setting up cluster on 7+1 nodes. We start with 7 nodes and then we will add one more node later.
- Gateway(s) and Management Service
- bigdataserver-1
- Masters
- bigdataserver-2 – Zookeeper, Namenode
- bigdataserver-3 – Zookeeper, Secondary Namenode
- bigdataserver-4 – Zookeeper
- Slaves or Worker Nodes
- bigdataserver-5 – Datanode
- bigdataserver-6 – Datanode
- bigdataserver-7 – Datanode
- We will create host group hdfs to run commands using ansible on all nodes where HDFS is running.
Learning Process
We will follow the same standard process to learn while adding any software-based service.
- Downloading and Installing – already taken care as part of adding hosts to the cluster.
- Configuration – we need to understand architecture and plan for the configuration.
- Architecture – Master, Helper, and Slaves
- Components – Namenode, Secondary Namenode, and Datanodes
- Configuration Files – /etc/hadoop/conf
- With cloudera the location is a bit different and we will see it after setting up the service.
- Service logs – /var/log/hadoop-hdfs
- Service Data – Different locations for different components of HDFS controlled by different properties such as dfs.data.dir, dfs.name.dir etc