Use hdfs haadmin commands to monitor and manage NameNode high availability (HA) in E-MapReduce (EMR) clusters.
Background
Before Hadoop 2.0.0, each cluster had a single NameNode. A node failure or operations and maintenance (O&M) activity — such as a software update or hardware upgrade — made the entire cluster unavailable until the NameNode reconnected.
Hadoop Distributed File System (HDFS) HA resolves this by running two NameNodes in active-standby mode. If the active NameNode fails or goes offline for O&M, a failover promotes the standby NameNode to active and restores cluster services.
EMR HA clusters implement HDFS HA using Quorum Journal Manager (QJM). For architecture details, see Architecture. For the HDFS deployment topology in EMR, see Deployment topology of HDFS.
Prerequisites
Before you begin, ensure that you have:
-
An EMR cluster with High Service Availability enabled (set at cluster creation time)
hdfs haadmin command reference
All commands must run as the hdfs user. Switch to the hdfs user before running any command:
su - hdfs
The following table lists the available hdfs haadmin subcommands.
| Subcommand | Description |
|---|---|
-getAllServiceState |
Return the state of all NameNodes |
-getServiceState <serviceId> |
Return the state (active or standby) of the specified NameNode |
-failover <currentActive> <newActive> |
Initiate a failover from one NameNode to another |
View the status of all NameNodes
Run -getAllServiceState to see which NameNodes are active and which are on standby:
su - hdfs
hdfs haadmin -getAllServiceState
Example output:
master-1-1.c-dadaf2f2bea8****.cn-hangzhou.emr.aliyuncs.com:8021 standby
master-1-2.c-dadaf2f2bea8****.cn-hangzhou.emr.aliyuncs.com:8021 active
master-1-3.c-dadaf2f2bea8****.cn-hangzhou.emr.aliyuncs.com:8021 standby
Check the state of a specific NameNode
Run -getServiceState <serviceId> to check whether a specific NameNode is active or standby. Use this command in monitoring scripts or cron jobs that need to behave differently based on NameNode state.
Service IDs for the NameNodes in an EMR cluster are nn1, nn2, and nn3.
su - hdfs
hdfs haadmin -getServiceState nn1
Replace nn1 with nn2 or nn3 to check those NameNodes.
Perform a failover
Run -failover <currentActive> <newActive> to promote a standby NameNode to active:
su - hdfs
hdfs haadmin -failover nn1 nn2
This promotes nn2 to active and transitions nn1 to standby. If nn2 is already active, the command completes successfully without making any changes.
Example output:
Failover to NameNode at master-1-2.c-dadaf2f2bea8****.cn-hangzhou.emr.aliyuncs.com successful