This topic describes the deployment topology of Hadoop Distributed File System (HDFS) components in a non-high-availability (non-HA) cluster and an HA cluster in E-MapReduce (EMR).
Non-HA cluster
In a non-HA cluster, a single NameNode handles all read and write requests. The Secondary NameNode runs alongside the NameNode on the same master node, but it does not provide failover — its sole job is to periodically merge EditLog files into FsImage files to keep NameNode restarts fast.
| Node | Component | Description |
|---|---|---|
| master-1-1 (emr-header-1 in some versions) | NameNode | Provides external read and write services. |
| master-1-1 (emr-header-1 in some versions) | Secondary NameNode | Merges EditLog files into FsImage files to accelerate NameNode restarts. Does not provide failover capability. |
| core-1-1 or emr-worker-x | DataNode | Manages and stores HDFS data blocks on the node's data disks. |
HA cluster
In an HA cluster, two or more NameNodes run in an Active/Standby configuration. Only the NameNode in Active state provides read and write services. When the active NameNode becomes unavailable, the ZKFailoverController (ZKFC) detects the failure through health checks and triggers a failover by acquiring an exclusive lock in ZooKeeper, promoting a standby NameNode to Active.
JournalNodes keep the standby NameNodes in sync: the active NameNode writes every namespace change to a quorum of JournalNodes, and the standby NameNodes continuously read those changes. A group of three JournalNodes can tolerate one failure — the NameNode can serve requests as long as at least two JournalNodes are healthy and writable.
| Node | Component | Description |
|---|---|---|
| master-1-1 (emr-header-1 in some versions) | ZKFailoverController (ZKFC) | Monitors the local NameNode's health and manages ZooKeeper sessions to perform primary/secondary election and switchover. |
| master-1-1 (emr-header-1 in some versions) | NameNode | The NameNode in Active state provides read and write services. Only the primary NameNode can provide external read and write services. NameNodes in Standby state stay synchronized and are ready for failover. |
| master-1-1 (emr-header-1 in some versions) | JournalNode | Stores EditLog files written by the active NameNode. Three JournalNodes are typically deployed as a group; at least two must be healthy for the NameNode to serve requests. |
| master-1-1 (emr-header-1 in some versions) | ZooKeeper | Provides the distributed coordination service used by ZKFC for elections and by other EMR components for HA state management. |
| master-1-2 (emr-header-2 in some versions) | ZKFC | Same role as on master-1-1. |
| master-1-2 (emr-header-2 in some versions) | NameNode | Same role as on master-1-1. |
| master-1-2 (emr-header-2 in some versions) | JournalNode | Same role as on master-1-1. |
| master-1-2 (emr-header-2 in some versions) | ZooKeeper | Same role as on master-1-1. |
| master-1-3 (emr-header-3 or emr-worker-1 in some versions) | \*ZKFC | Same role as on master-1-1. Note By default, three ZKFC and NameNode pairs are deployed for HA clusters that use Hadoop 3.x in EMR V5.8.0 or later. One pair is deployed on master-1-3. |
| master-1-3 (emr-header-3 or emr-worker-1 in some versions) | \*NameNode | Same role as on master-1-1. Note By default, three ZKFC and NameNode pairs are deployed for HA clusters that use Hadoop 3.x in EMR V5.8.0 or later. One pair is deployed on master-1-3. |
| master-1-3 (emr-header-3 or emr-worker-1 in some versions) | JournalNode | Same role as on master-1-1. |
| master-1-3 (emr-header-3 or emr-worker-1 in some versions) | ZooKeeper | Same role as on master-1-1. |
| core-1-1 or emr-worker-x | DataNode | Manages and stores HDFS data blocks on the node's data disks. |
\* Deployed by default for Hadoop 3.x in EMR V5.8.0 or later.