In the previous blog, we discussed about the need and design goals of HDFS High Availability. In this blog, we will talk about the architecture of HDFS high availability.

HDFS High Availability Architecture

In order to provide a HOT back-up and consistent solution for NameNode failure, a concept of using two NameNodes (one Active and one StandBy) was introduced. The below diagram describes the architecture of HDFS high availability.

HDFS HA Architecture
HDFS High Availability Architecture

In a cluster, two nodes can be configured as NameNodes. Each NameNode is assigned a role, either Active or StandBy. The Active NameNode handles the client requests in the cluster, and the Standby NameNode acts as a back-up node and maintains enough state to provide a consistent FS-Image during failure of Active NameNode.

In order of sync the state of the NameNodes, the edit logs from the Active NameNode needs to be shared to the StandBy NameNode. There are two state synchronization methods available with Hadoop, Quorum Journal Manager or using a Network File System.

The DataNodes send block location information and heartbeats to both the NameNodes.  At any point in time, exactly one of the NameNodes should be in Active state, or if both the NameNodes are in Active state, then it’ll result in “split-brain scenario“. To avoid this scenario, an administrator should configure a fencing method.

If Active NameNode failure occurs, the StandBy NameNode state is changed to Active. This state transition from StandBy to Active is either manual or automatic. After successful transition, the client requests will be redirected to the new Active NameNode.

Related Blogs
a) HDFS High Availability Overview
b) HDFS High Availability Configuration
c) Start HDFS High Availability Cluster

Share this:

One thought on “HDFS High Availability Architecture

Leave a Reply

Your email address will not be published. Required fields are marked *