All Products
Search
Document Center

E-MapReduce:Associate a Spark cluster with RSS

Last Updated:Mar 26, 2026

Remote Shuffle Service (RSS) is an E-MapReduce (EMR) extension that improves the stability and performance of Spark Shuffle and enables dynamic resource allocation in Container Service for Kubernetes (ACK) clusters. This topic describes how to associate a Spark cluster with a Shuffle Service cluster on the EMR on ACK page.

Why RSS

Spark Shuffle in ACK clusters has the following limitations:

  • Storage dependency: Spark Shuffle requires local storage. On compute-storage-separated nodes or elastic container instances without local disks, you must purchase and attach disks, increasing cost and reducing efficiency.

  • Dynamic allocation gaps: Spark 2 does not support dynamic allocation. Spark 3 supports it via Shuffle tracking, but executor recycling efficiency is low.

  • Write amplification: Data overflow in shuffle write tasks causes write amplification.

  • Connection reset: A large number of small-size network packets in shuffle read tasks causes connection resets.

  • High disk and CPU load: Shuffle read tasks generate many small-size I/O requests and random reads.

  • Connection scaling: With thousands of mappers (M) and reducers (N), the M x N connection count makes jobs difficult to run.

RSS eliminates these limitations and provides native dynamic allocation support in ACK clusters.

Prerequisites

Before you begin, make sure that you have:

  • A Spark cluster created on the EMR on ACK page. For more information, see Step 2: Create a cluster.

  • A Shuffle Service cluster created on the EMR on ACK page. For more information, see Step 2: Create a cluster.

  • Both clusters in the same ACK cluster. Cross-ACK-cluster associations are not supported.

  • Matching major EMR versions on both clusters. A version mismatch causes compatibility issues that can prevent jobs from running. You can check the version on the Cluster Details tab for each cluster.

Associate a Spark cluster with a Shuffle Service cluster

  1. Log on to the EMR console. In the left-side navigation pane, click EMR on ACK.

  2. On the EMR on ACK page, find the Spark cluster and click its name in the Cluster ID/Name column.

  3. On the Cluster Details tab, go to the Basic Information section and click Associate Now to the right of Associate RSS Cluster.

  4. On the Service Details tab, go to the Associated Cluster section and click Add.

  5. In the Associated Cluster dialog box, select your Shuffle Service cluster from the Cluster drop-down list and click Associate.

What's next