Durga Gadiraju

Introduction

As part of this section we will see how to set up HDFS components such as Namenode, Secondary Namenode, Datanodes etc while exploring some of the key concepts of this very important service.

Setup HDFS
Copy Data into HDFS
Components of HDFS
Configuration Files and Important Properties
Review Web UIs and log files
Checkpointing and Namenode Recovery
Configure Rack Awareness

Cluster Topology

We are setting up cluster on 7+1 nodes. We start with 7 nodes and then we will add one more node later.

Gateway(s) and Management Service
- bigdataserver-1
Masters
- bigdataserver-2 – Zookeeper, Namenode
- bigdataserver-3 – Zookeeper, Secondary Namenode
- bigdataserver-4 – Zookeeper
Slaves or Worker Nodes
- bigdataserver-5 – Datanode
- bigdataserver-6 – Datanode
- bigdataserver-7 – Datanode
We will create host group hdfs to run commands using ansible on all nodes where HDFS is running.

Learning Process

We will follow the same standard process to learn while adding any software-based service.

Downloading and Installing – already taken care as part of adding hosts to the cluster.
Configuration – we need to understand architecture and plan for the configuration.
- Architecture – Master, Helper, and Slaves
- Components – Namenode, Secondary Namenode, and Datanodes
- Configuration Files – /etc/hadoop/conf
- With cloudera the location is a bit different and we will see it after setting up the service.
Service logs – /var/log/hadoop-hdfs
Service Data – Different locations for different components of HDFS controlled by different properties such as dfs.data.dir, dfs.name.dir etc

Durga Gadiraju

Introduction

Share this post

Join Our Community

Follow Us

Links

Contact Info

Address

Phone

Email