In the previous blog, we discussed about the HDFS High availability architecture. This blog describes the configurations for HDFS high availability in a Hadoop cluster.

Pre-requisites

Before configuring HDFS high availability, make sure that your Hadoop cluster has the following pre-requisites:

a) You must have at-least two nodes to enable HDFS high availability.

b) If you want to configure automatic failover for HDFS, then you need to have a running Zookeeper ensemble. To install Zookeeper in clustered mode, you can refer to the blog here.

 

HDFS High Availability Configuration Steps

The configuration of HDFS high availability is broadly divided into 5 simple steps.

 

HDFS High Availability Configuration Steps
HDFS High Availability Configuration Steps

Step 1 # Define Nodes

High Availability clusters reuse the NameService ID to identify a single HDFS instance that may in fact consist of multiple High Availability NameNodes. Each distinct NameNode in a NameService has a different NameNode ID. The first step is to define the NameService ID and the NameNodes associated with the NameService.

Add the following properties to hdfs-site.xml file of your cluster:

<property>
    <name>dfs.nameservices</name>
    <value>testHadoop</value>
</property>

<property>
    <name>dfs.ha.namenodes.testHadoop</name>
    <value>nn1,nn2</value>
</property>

<property>
    <name>dfs.namenode.rpc-address.testHadoop.nn1</name>
    <value><HostName_NameNode_1>:8020</value>
</property>

<property>
    <name>dfs.namenode.rpc-address.testHadoop.nn2</name>
    <value><HostName_NameNode_2>:8020</value>
</property>

<property>
    <name>dfs.namenode.http-address.testHadoop.nn1</name>
    <value><HostName_NameNode_1>:50070</value>
</property>

<property>
    <name>dfs.namenode.http-address.testHadoop.nn2</name>
    <value><HostName_NameNode_2>:50070</value>
</property>

Update or add the fs.defaultFS property in core-site.xml to use NameService ID as the HDFS URI.

<property>
    <name>fs.defaultFS</name>
    <value>hdfs://testHadoop/</value>
</property>

NOTE: If there is fs.defaultname property in core-site.xml then remove that.

Step 2 # Storage for Edits’ Files
HDFS high availability requires sharing of edits files to provide a consistent copy FS-image on the StandBy NameNode. Hadoop has two mechanisms to provide sharing of edits files, Quorum Journal Manager and Conventional Shared Storage. 

Mechanisms to Share Edits Logs in HA
Mechanisms to Share Edits Logs in HA

Quorum Journal Manager mechanism is preferred over the Conventional Shared Storage systems.

a)  Quorum Journal Manager – This mechanism uses a group of separate daemons called “Journal Nodes” (JNs) that will provide the storage for edits logs. You need to configure the Journal Node directory and the quorum for Journal Nodes for the NameService ID. The JournalNodes daemons allow only an Active NameNode to write the log files and the StandBy NameNode will read these files.

In order to avoid single-point of failure, it is recommended to have at-least 3 nodes with JournalNode daemon.

You need to specify the URI for the group of JNs where the NameNodes will write/read edits files. It also requires a Journal ID, a unique identifier for the NameService ID that enables sharing of JournalNode daemons across multiple NameServices. It is recommended to use Journal ID same as the NameService ID. The default port used by JournalNode process is 8485.

Add the following properties to hdfs-site.xml file of your cluster:

<property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://host1:port1;host2:port2;host3:port3/testHadoop</value>
</property>

You also need to specify the Journal Node directory on the local file system for storing edits log files.

<property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/home/hduser/hadoop_data/dfs/jndata</value>
</property>

b)  Conventional Shared Storage – The location of the shared storage directory, should be an Network File System. This directory should be mounted with read/write permissions on both NameNode machines. The value of the directory location should be the absolute path on both the NameNode machines. Add the following property to hdfs-site.xml file of your cluster:

<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>file:///mnt/hadoopHA/dfs/jndata</value>
</property>

Step 3 # Client Failover

If HDFS high availability is configured, a Hadoop client connects to Active NameNode. During Active NameNode failover, state of a StandBy NameNode will change to Active. To determine the current Active NameNode, Hadoop provides a Java class that HDFS clients use to contact the Active NameNode, i.e. the NameNode currently serving the client requests. Add the following properties to hdfs-site.xml file of your cluster:

<property>
    <name>dfs.client.failover.proxy.provider.testHadoop</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

Step 4 # Fencing

With the High Availability configured for HDFS, one NameNode serves as active while other is serving as standby node. If active NameNode goes down, standby takes over the responsibility and configured proxy would direct requests to new active NameNode. Now there would be only 1NameNode which is in active state. If at later point of time, the first NameNode join the cluster network again it would not aware of the cluster state and would try to gain lock on the metadata files. Also the proxy would not be able to direct the request. As only oneNameNode (active) is allowed to change state of metadata, two concurrently active NameNode cause Split Brain scenario. To prevent this, hdfs provide fencing mechanism with following 2 options:

sshfence: This method provides a mechanism to ssh to active namenode and kill the process. For this it uses fuser to kill the process running on the specified TCP port. It also require passwordless access to host machine. To enable this fencing mechanism,  user need to set/add following properties in hdfs-site.xml.

<property>
  <name>dfs.ha.fencing.methods</name>
  <value>sshfence</value>
</property>

<property>
  <name>dfs.ha.fencing.ssh.private-key-files</name>
  <value>/home/exampleuser/.ssh/id_rsa</value>
</property>

shell: This method provides a mechanism to run a custom script or shell command. It could be configured as:

<property>
  <name>dfs.ha.fencing.methods</name>
  <value>shell(/path/to/my/script.sh arg1 arg2 ...)</value>
</property>

This would run script.sh with specified arguments. If the shell command returns an exit code of 0, the fencing is determined to be successful. If it returns any other exit code, the fencing was not successful.

If you are using QJM method, then you can specify the property as:

<property>
  <name>dfs.ha.fencing.methods</name>
  <value>shell(/bin/true)</value>
</property>

This would run script.sh with specified arguments. If the shell command returns an exit code of 0, the fencing i

Step 5 # Automatic Failover

In order to provide a Hot BackUp solution during failover, Hadoop provides an automatic failover mechanism. It requires two extra components, a ZooKeeper Quorum & a Zookeeper Failover Controller daemon.

A Zookeeper Quorum is used for failure detection and Active NameNode election. Each NameNode in the cluster maintains a persistent session in ZooKeeper. If the NameNode process crashes, the session will expire, notifying the other NameNode that a failover should be triggered. If the current active NameNode crashes, another node is elected to become the next active.

A Zookeeper Failover Controller (ZKFC) daemon runs on the NameNodes to maintain a session with Zookeeper quorum. It is responsible for health monitoring of the NameNode. When the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper & a special “lock”. If the session expires, the lock node will be automatically deleted. It takes part in the Active NameNode election during failover.

You need to enable the automatic failover for the NameService and specify the value for Zookeeper Quorum used. Add the following properties to hdfs-site.xml file of your cluster:

<property>
    <name>dfs.ha.automatic-failover.enabled.testHadoop</name>
    <value>true</value>
</property>

<property>
    <name>ha.zookeeper.quorum</name>
    <value>zk1:2181,zk2:2181,zk3:2181</value>
</property>

The value for ha.zookeeper.quorum is a comma-separated list for running Zookeeper servers and their client ports.

You have now successfully completed the configuration part for HDFS high availability. You can refer to the next blog to start your HDFS high availability cluster.

Reference: Apache Hadoop HDFS High Availability

Related Blogs
a) HDFS High Availability Overview
b) HDFS High Availability Architecture
c) Start HDFS High Availability Cluster

Share this:

3 thoughts on “HDFS High Availability Configuration

  1. Thanks,

    Where is this path avaialble on NN, DN, JN ? /path/to/my/script.sh –None we explaining about this script, location name of the node etc..

    Give me full detail pls, how to run this script

    dfs.ha.fencing.methods
    shell(/path/to/my/script.sh arg1 arg2 …)

    1. Hi Narayana,

      The fencing mechanism in HA is used to avoid split-brain scenario, i.e. a cluster with two Active NameNodes.
      During a transition from StandBy state to Active state, a NameNode executes a script and ensures that no Active NameNode is alive.
      In script based fencing mechanism, an administrator defines a shell script to kill any running Active NameNode service. For a successful transition, the script should return true (exit status 0), else the transition from StandBy to Active will fail.

      The script is placed on both the NameNodes with execute permissions for the user who started NameNode service. You can place the script within Hadoop folder itself (like $HADOOP_HOME/sbin/).

      A NameNode runs this script and an administrator does not require to execute the script explicitly. If an administrator needs to test the script, then he/she can simply execute the script using sh command of Linux.

      For more information, you can refer to the Hadoop documentation @ http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#Deployment

      Hope it gives you a better understanding of the concept. Let us know if there’s any other concern.

      Thank you for following our blogs.
      ProTechSkills

Leave a Reply to protechskills Cancel reply

Your email address will not be published. Required fields are marked *