Background

Single Point of Failure (SPOF) in HDFS: Each cluster had a single NameNode, and if that machine or process became unavailable, the cluster as a whole would be unavailable until the NameNode was either restarted or brought up on a separate machine.

Ecosystem Dependency: The Hadoop ecosystem components like Map-Reduce, Pig, Hive, HBase depends on HDFS directly.

To mitigate the unavailability of the cluster, high availability of NameNode daemon was one of major concern for Hadoop community.

Availability can be defined as the ratio of the expected value of the uptime of the system to the aggregate of the expected values of uptime and downtime. There are two metrics that are used to define the availability:

  • Mean time between failures (MTBF) – the expected time between failures of a system during operation.
  • Mean time to recovery (MTTR) – the average time the system will take to recover from any failure.

HDFS availability is calculated as MTBF / (MTBF + MTTR). The goal for HDFS is to maximize the availability and minimize the NameNode recovery time.

NameNode failure generally occurs due to either hardware or software failures, human errors or even system maintenance.

Metadata Recovery Methods before High Availability

NameNode metadata contains all the information related to the files stored in HDFS. It is stored in the form of FSimage (file system image) on the local hard disk of the system working as a NameNode. Before the introduction of High Availability feature in Hadoop, the recovery of NameNode metadata was done using either of the two methods:-

a)    NFS Mounted Directory: Setting the dfs.name.dir property of NameNode to write to more than one directory. The NameNode daemon will write to both locations, keeping the HDFS metadata in sync.

    <property>
        <name>dfs.name.dir</name>
        <value>/home/hadoop/namedir,/mnt/hadoop/namedir</value>
    </property>

b)    Secondary / Checkpoint / Backup Node: A Hadoop administrator can configure a backup Node (as per the Hadoop Version used) to use the older copy of the NameNode metadata. The file system image (fsimage) of the NameNode is stored on another node within the cluster. If NameNode goes down, the metadata on the backup node is used to restore the HDFS state.

Metadata Recovery Steps

The above methods store a backup for metadata FSimage. When NameNode goes down, a Hadoop admin needs to perform few steps to restart the NameNode service using the backup file.

a)    Choose a new machine on the same network for NameNode.

b)    Change the hostname of that machine to match the old NameNode’s hostname or update client configuration files to use new NameNode’s hostname.

c)     Install Hadoop similarly to the NameNode installation.

d)    For mounted NFS, mount the remote NFS directory in the same location & start the NameNode normally. For secondary or checkpoint node, start the NameNode with -importCheckpoint option.

This is a manual process and it increases the downtime of the NameNode. Hence, this solution is not considered as a HOT backup solution.

High Availability  Design Goals

As discussed above, high availability of NameNode was one of the most critical goal for Hadoop community. A production cluster could not afford unavailability of HDFS for any amount of time. There were few goals while designing the HDFS high availability feature.

a)   Focused on minimizing the unplanned NameNode downtime and IT service disruption.

b)   To provide a HOT Standby for Hadoop NameNode service failure.

c)   Provide Automatic recovery – To reduce the cluster downtime during NameNode failure, if a NameNode goes down, then the other NameNode should start handling the client requests automatically.

d)   Provide Manual recovery – To allow administrators to gracefully shift the client requests to other NameNode during system maintenance, a manual recovery of NameNode is also required.

To read about the architecture of HDFS High Availability, you can refer to the blog here.

Related Blogs:
a) High Availability Architecture
b) High Availability Configuration
c) Start HDFS High Availability Cluster

Share this:

Leave a Reply

Your email address will not be published. Required fields are marked *