YARN High Availability Architecture for Fault-Tolerant Clusters - E-MapReduce

Hadoop YARN keeps applications and containers running through three complementary high availability (HA) mechanisms: ResourceManager High Availability (RM HA), ResourceManager restarts, and NodeManager restarts. Together, they eliminate single points of failure (SPOFs) and ensure continuity across common disruption scenarios—ResourceManager SPOFs, and ResourceManager or NodeManager upgrades and restarts.

How it works

YARN is a distributed cluster resource management system built on a master-slave architecture. The ResourceManager (master) schedules jobs and manages cluster resources. The NodeManager (slave) manages and monitors jobs on individual nodes.

All three HA mechanisms depend on ZooKeeper for distributed leader election and state persistence, which guarantees strong consistency across the cluster.

Configuration file	Configuration item	Recommended value	Description
core-site.xml	`hadoop.zk.address`	`<zk1-host>:<zk1-port>,<zk2-host>:<zk2-port>,<zk3-host>:<zk3-port>`	ZooKeeper endpoints used for leader election and application state storage. Separate multiple endpoints with commas (,).

RM HA

RM HA runs multiple ResourceManager processes on different nodes. At any time, only one active ResourceManager records and synchronizes application state to ZooKeeper. If the active ResourceManager or its host node fails, the standby ResourceManagers hold a new election using ZooKeeper's distributed locking mechanism. The newly elected active ResourceManager restores all application state from ZooKeeper and resumes resource management and scheduling.

Configuration

All RM HA settings go in yarn-site.xml.

Configuration item	Recommended value	Default	Description
`yarn.resourcemanager.ha.enabled`	`true`	`false`	Enables RM HA.
`yarn.resourcemanager.ha.automatic-failover.enabled`	`true` or leave blank	`true`	Enables automatic failover.
`yarn.resourcemanager.ha.automatic-failover.embedded`	`true` or leave blank	`true`	Uses the embedded leader elector to elect the active ResourceManager.
`yarn.resourcemanager.ha.curator-leader-elector.enabled`	`true`	`false`	Uses non-curator components for leader election.
`yarn.resourcemanager.ha.automatic-failover.zk-base-path`	Leave blank	`/yarn-leader-election`	Root directory in ZooKeeper where the leader elector state is stored.
`yarn.resourcemanager.ha.rm-ids`	`rm1,rm2,rm3`	—	IDs of all ResourceManagers. Separate multiple IDs with commas (,).
`yarn.resourcemanager.cluster-id`	`<cluster-id>`	—	Cluster ID. The RM HA storage path in ZooKeeper is derived from this value.
`yarn.resourcemanager.hostname.<rm-id>`	Leave blank	—	Hostname of a ResourceManager instance. Configure one entry per ResourceManager.
`yarn.resourcemanager.address.<rm-id>`	Leave blank	—	Remote procedure call (RPC) address for YARN clients to submit jobs. Configure one entry per ResourceManager.
`yarn.resourcemanager.scheduler.address.<rm-id>`	Leave blank	—	RPC address for ApplicationMasters to request resources. Configure one entry per ResourceManager.
`yarn.resourcemanager.resource-tracker.address.<rm-id>`	Leave blank	—	RPC address for NodeManagers to report resource and container status. Configure one entry per ResourceManager.
`yarn.resourcemanager.admin.address.<rm-id>`	Leave blank	—	RPC address for admin commands. Configure one entry per ResourceManager.
`yarn.resourcemanager.webapp.address.<rm-id>`	Leave blank	—	HTTP address for the ResourceManager web UI. Configure one entry per ResourceManager.
`yarn.resourcemanager.webapp.https.address.<rm-id>`	Leave blank	—	HTTPS address for the ResourceManager web UI. Required only when `yarn.http.policy` is set to `HTTPS_ONLY`. Configure one entry per ResourceManager.

ResourceManager restarts

A ResourceManager restart continuously synchronizes application state to ZooKeeper. When the ResourceManager restarts, it reloads that state so applications can resume automatically after an EMR cluster upgrade or restart.

Two restart modes are available:

Restart mode	Behavior	Impact on running applications
Work-preserving	The ResourceManager takes over all running applications from restored state.	Minimal—applications continue without interruption.
Non-work-preserving	All running applications are stopped and resubmitted based on restored state.	Significant—all running applications are interrupted.

Configuration

All ResourceManager restart settings go in yarn-site.xml.

Configuration item	Recommended value	Default	Description
`yarn.resourcemanager.recovery.enabled`	`true`	`false`	Enables the ResourceManager restart feature.
`yarn.resourcemanager.work-preserving-recovery.enabled`	`true` or leave blank	`true`	Enables work-preserving restart.
`yarn.resourcemanager.store.class`	`org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore`	—	State store implementation class. Only ZooKeeper supports RM HA.
`yarn.resourcemanager.zk-state-store.parent-path`	Leave blank	`/rmstore`	ZooKeeper path where application state is stored.

NodeManager restarts

A NodeManager restart continuously synchronizes running container state to local storage (such as LevelDB). When the NodeManager restarts, it reloads that state so running containers are not affected. Applications are not interrupted during node upgrades or restarts.

Configuration

All NodeManager restart settings go in yarn-site.xml.

Configuration item	Recommended value	Default	Description
`yarn.nodemanager.recovery.enabled`	`true`	`false`	Enables the NodeManager restart feature.
`yarn.nodemanager.recovery.dir`	`/home/hadoop/yarn-nm-recovery`	`${hadoop.tmp.dir}/yarn-nm-recovery`	Local directory for container state storage. Use a system disk path other than `/tmp`, and make sure the `hadoop` user has read and write permissions. The `/home/hadoop/yarn-nm-recovery` path is recommended to avoid data loss from `/tmp` cleanup or disk failures.
`yarn.nodemanager.recovery.supervised`	`true`	`false`	Retains local state when the NodeManager exits. When set to `true`, state is restorable after a restart.
`yarn.nodemanager.address`	`${yarn.nodemanager.hostname}:8041` or `0.0.0.0:8041`	—	RPC address of the NodeManager, also used as its ID. Use a fixed port. If the port is set to `0`, a random port is assigned on each restart, changing the NodeManager's ID and making recovery invalid.

E-MapReduce:High availability feature of YARN

How it works

RM HA

Configuration

ResourceManager restarts

Configuration

NodeManager restarts

Configuration

What's next