All Products
Search
Document Center

E-MapReduce:ESS (For existing users only)

Last Updated:Mar 26, 2026

EMR Remote Shuffle Service (ESS) is an extension for E-MapReduce (EMR) that replaces the default shuffle mechanism with a dedicated remote service, reducing connection overhead, disk pressure, and memory usage for large-scale Apache Spark jobs.

Limitations

ESS applies only to the following EMR versions:

  • EMR V4.X.X

  • Versions earlier than EMR V3.39.1

  • Versions earlier than EMR V5.5.0

For EMR V3.39.1 and later, or EMR V5.5.0 and later, use RSS instead.

How it works

Traditional shuffle has several structural problems at scale:

ProblemImpact
Large shuffle write tasks overflow to diskWrite amplification
Small network packets accumulate during shuffle readConnection resets
Many small I/O requests with random readsHigh disk and CPU load
M mappers x N reducers produce M x N connectionsJobs fail to run at scale
Spark shuffle service runs on NodeManager; extreme data volumes cause NodeManager restartsYARN scheduling instability

ESS addresses each of these:

SolutionHow it helps
Push-based shuffle (instead of pull-based)Reduces mapper memory pressure
I/O aggregationCuts shuffle read connections from M x N to N; converts random reads to sequential reads
Two-replica mechanismReduces fetch failure probability
Compute-storage separationShuffle service runs on dedicated hardware, independent of compute nodes
No local disk dependencyEnables Spark on Kubernetes without local disk requirements

The following figure shows the ESS architecture.

ESS

Create a cluster with ESS

The following steps use EMR V4.5.0 as an example. Create an EMR cluster with ESS using either of these cluster types:

  • EMR Shuffle Service cluster — a cluster type dedicated to shuffle offloading.

    SHUFFLE

  • EMR Hadoop cluster — a general-purpose cluster that also supports ESS.

    ESS

For cluster creation steps, see Create a cluster.

Submit a Spark job with ESS

When submitting a Spark job, add the required parameters below. For how to configure job parameters in EMR, see Edit jobs. For the full list of Spark configuration options, see Spark Configuration.

Required parameters

These parameters must be set for every Spark job that uses ESS.

ParameterValueNotes
spark.shuffle.managerorg.apache.spark.shuffle.ess.EssShuffleManagerSwitches Spark to use the ESS shuffle manager
spark.ess.master.address<ess-master-ip>:9097Public IP address of the master node; port is always 9097
spark.shuffle.service.enabledfalseDisables the built-in external shuffle service, which conflicts with ESS

Compatibility parameters

Set these parameters to ensure compatibility.

ParameterValueNotes
spark.shuffle.useOldFetchProtocoltrueESS uses the original shuffle fetch protocol
spark.sql.adaptive.enabledfalseESS does not support Adaptive Query Execution (AQE)
Note

Disabling AQE (spark.sql.adaptive.enabled=false) also disables spark.sql.adaptive.skewJoin.enabled.

ESS service parameters

View and edit these parameters on the ESS service configuration page.

Memory parameters

ParameterDescriptionDefault
ess_master_memoryHeap memory of the master node4g
ess_worker_memoryHeap memory of each core node4g
ess_worker_offheap_memoryOff-heap memory of each core node4g

Flush buffer parameters

ParameterDescriptionDefault
ess.worker.flush.buffer.sizeSize of each flush buffer. Flushing triggers when this size is exceeded. Unit: KB256k
ess.worker.flush.queue.capacityNumber of flush buffers per directory512

Heap memory estimation: Heap memory used per directory is calculated as:

heap memory per directory = ess.worker.flush.buffer.size x ess.worker.flush.queue.capacity

Example with default values and 28 directories:

per directory: 256 KB x 512 = 128 MB
total memory:  128 MB x 28 = 3.5 GB
total slots:   512 x 28 / 2 = 7,168

Each directory provides slots equal to half of ess.worker.flush.queue.capacity. To improve read/write throughput, configure a maximum of two directories per disk.

Timeout and reliability parameters

ParameterDescriptionDefault
ess.flush.timeoutTimeout for flushing data to the storage layer. Unit: seconds240s
ess.application.timeoutHeartbeat timeout. After this period, application resources are released. Unit: seconds240s
ess.push.data.replicateEnables the two-replica mechanism (true/false). Enable in production to reduce fetch failures.true

Monitoring parameters

ParameterDescriptionDefault
ess.metrics.system.enableEnables monitoring (true/false)false

What's next

  • RSS — the Remote Shuffle Service available for EMR V3.39.1 and later, or EMR V5.5.0 and later

  • Edit jobs — configure Spark job parameters in EMR

  • Spark Configuration — full reference for Spark configuration options