All Products
Search
Document Center

E-MapReduce:HDFS deployment topologies

Last Updated:Mar 26, 2026

This topic describes the deployment topology of Hadoop Distributed File System (HDFS) components in a non-high-availability (non-HA) cluster and an HA cluster in E-MapReduce (EMR).

Non-HA cluster

In a non-HA cluster, a single NameNode handles all read and write requests. The Secondary NameNode runs alongside the NameNode on the same master node, but it does not provide failover — its sole job is to periodically merge EditLog files into FsImage files to keep NameNode restarts fast.

NodeComponentDescription
master-1-1 (emr-header-1 in some versions)NameNodeProvides external read and write services.
master-1-1 (emr-header-1 in some versions)Secondary NameNodeMerges EditLog files into FsImage files to accelerate NameNode restarts. Does not provide failover capability.
core-1-1 or emr-worker-xDataNodeManages and stores HDFS data blocks on the node's data disks.

HA cluster

In an HA cluster, two or more NameNodes run in an Active/Standby configuration. Only the NameNode in Active state provides read and write services. When the active NameNode becomes unavailable, the ZKFailoverController (ZKFC) detects the failure through health checks and triggers a failover by acquiring an exclusive lock in ZooKeeper, promoting a standby NameNode to Active.

JournalNodes keep the standby NameNodes in sync: the active NameNode writes every namespace change to a quorum of JournalNodes, and the standby NameNodes continuously read those changes. A group of three JournalNodes can tolerate one failure — the NameNode can serve requests as long as at least two JournalNodes are healthy and writable.

NodeComponentDescription
master-1-1 (emr-header-1 in some versions)ZKFailoverController (ZKFC)Monitors the local NameNode's health and manages ZooKeeper sessions to perform primary/secondary election and switchover.
master-1-1 (emr-header-1 in some versions)NameNodeThe NameNode in Active state provides read and write services. Only the primary NameNode can provide external read and write services. NameNodes in Standby state stay synchronized and are ready for failover.
master-1-1 (emr-header-1 in some versions)JournalNodeStores EditLog files written by the active NameNode. Three JournalNodes are typically deployed as a group; at least two must be healthy for the NameNode to serve requests.
master-1-1 (emr-header-1 in some versions)ZooKeeperProvides the distributed coordination service used by ZKFC for elections and by other EMR components for HA state management.
master-1-2 (emr-header-2 in some versions)ZKFCSame role as on master-1-1.
master-1-2 (emr-header-2 in some versions)NameNodeSame role as on master-1-1.
master-1-2 (emr-header-2 in some versions)JournalNodeSame role as on master-1-1.
master-1-2 (emr-header-2 in some versions)ZooKeeperSame role as on master-1-1.
master-1-3 (emr-header-3 or emr-worker-1 in some versions)\*ZKFCSame role as on master-1-1.
Note

By default, three ZKFC and NameNode pairs are deployed for HA clusters that use Hadoop 3.x in EMR V5.8.0 or later. One pair is deployed on master-1-3.

master-1-3 (emr-header-3 or emr-worker-1 in some versions)\*NameNodeSame role as on master-1-1.
Note

By default, three ZKFC and NameNode pairs are deployed for HA clusters that use Hadoop 3.x in EMR V5.8.0 or later. One pair is deployed on master-1-3.

master-1-3 (emr-header-3 or emr-worker-1 in some versions)JournalNodeSame role as on master-1-1.
master-1-3 (emr-header-3 or emr-worker-1 in some versions)ZooKeeperSame role as on master-1-1.
core-1-1 or emr-worker-xDataNodeManages and stores HDFS data blocks on the node's data disks.

\* Deployed by default for Hadoop 3.x in EMR V5.8.0 or later.