Let us see how we can configure High Availability for HDFS Namenode. It can be done using Shared Edits or Journal Nodes. However, using Cloudera Manager only Journal Nodes is supported.
- Prerequisite – Make sure zookeeper is installed and running along with HDFS components such as Namenode, Secondary Namenode, Datanodes.
- Let us review what are all added to Zookeeper till this point using zkCli.sh –
/usr/lib/zookeeper/bin/zkCli.sh -server bigdataserver-2:2181,bigdataserver-3:2181,bigdataserver-4:2181
- Go to Dashboard and Click on HDFS
- Click on Actions and click on Enable High Availability and follow these steps to configure HA.
- Specify the name for the name service – either go with default or custom name (e.g.: nameservice1)
- Selecting Active Namenode – CM automatically selects the node host or you can manually click on select a host and select Active Namenode host (e.g.: bigdataserver-2)
- Selecting Standby Namenode – Then we need to select Standby Namenode which must be a different host from active Namenode. Both Active and Standby Namenode hosts should have the same hardware configuration.
- Next, we should select Journal Nodes by clicking on select hosts.
- The number of Journal Nodes should be odd number and a minimum of three. We will choose bigdataserver-2, bigdataserver-3 and bigdataserver-4.
- You can enter the same path or different directory path for Journal node directory which is empty and has appropriate permission on all nodes. We will be using /data1/jn on all 3 nodes.
- CM executes a set of commands that will stop the dependent services, delete Secondary Namenode, reconfigure roles and directories as specified, create a nameservice and failover controllers, restart the dependent services, and deploy the new client configuration. We will review these things using Cloudera Manager Web Interface.
- Let us also review the output of zkCli.sh command to see what is happening in Zookeeper.
- You can click on HDFS Service and then Namenode. You will be seeing one as Active and other as Standby.
- If there are other services running on cluster, then you might have to make changes to them as well.
Task: Stop Active Namenode and see if the Standby become Active. Also validate that there is no downtime to the application. You will also see the stopped server become Standby after few moments.