HDFS High Availability Architecture

Posted on August 6, 2014September 22, 2014 by ProTechSkills

In the previous blog, we discussed about the need and design goals of HDFS High Availability. In this blog, we will talk about the architecture of HDFS high availability.

HDFS High Availability Architecture

In order to provide a HOT back-up and consistent solution for NameNode failure, a concept of using two NameNodes (one Active and one StandBy) was introduced. The below diagram describes the architecture of HDFS high availability.

HDFS HA Architecture — **HDFS High Availability Architecture**

In a cluster, two nodes can be configured as NameNodes. Each NameNode is assigned a role, either Active or StandBy. The Active NameNode handles the client requests in the cluster, and the Standby NameNode acts as a back-up node and maintains enough state to provide a consistent FS-Image during failure of Active NameNode.

In order of sync the state of the NameNodes, the edit logs from the Active NameNode needs to be shared to the StandBy NameNode. There are two state synchronization methods available with Hadoop, Quorum Journal Manager or using a Network File System.

The DataNodes send block location information and heartbeats to both the NameNodes. At any point in time, exactly one of the NameNodes should be in Active state, or if both the NameNodes are in Active state, then it’ll result in “split-brain scenario“. To avoid this scenario, an administrator should configure a fencing method.

If Active NameNode failure occurs, the StandBy NameNode state is changed to Active. This state transition from StandBy to Active is either manual or automatic. After successful transition, the client requests will be redirected to the new Active NameNode.

PROTECHSKILLS

HDFS High Availability Architecture

HDFS High Availability Architecture

One thought on “HDFS High Availability Architecture”

Leave a Reply Cancel reply

Get in touch with us today! Let's Talk About Your Needs.