All Products
Search
Document Center

E-MapReduce:Create a cluster

Last Updated:Mar 26, 2026

This topic describes how to create an EMR on ACK cluster using the EMR console. EMR on ACK runs big data workloads — Spark, Presto, Flink, and Shuffle Service — on Container Service for Kubernetes (ACK) clusters, letting you separate compute and storage and use Kubernetes for resource scheduling.

Prerequisites

Complete the following steps before creating a cluster. If you have already completed a step, skip it.

  1. Attach the AliyunOSSFullAccess and AliyunDLFFullAccess policies to a RAM role. For more information, see Attach policies to a RAM role.

  2. Create an ACK cluster. For more information, see Create an ACK dedicated cluster or Create an ACK managed cluster.

  3. Create a node pool. For more information, see Create a node pool.

  4. Activate Object Storage Service (OSS). For more information, see Activate OSS.

Choose a cluster type

EMR on ACK supports four cluster types. The cluster type cannot be changed after creation, so select the type that matches your workload before proceeding.

Cluster typeBest forKey characteristics
Shuffle ServiceSpark clusters on ACK nodes without local disksProvides a remote shuffle service using Celeborn. Requires nodes from big data instance families or instance families with local SSDs. Supports dynamic resource allocation.
PrestoInteractive queries on large datasetsAn in-memory distributed SQL engine that supports various data sources, suitable for complex analysis of petabytes of data and cross-source queries.
SparkETL, batch processing, and data modelingA general-purpose distributed big data processing engine. To associate a Spark cluster with a Shuffle Service cluster, both clusters must use the same EMR version — for example, EMR-5.x-ack.
FlinkStateful processing on bounded or unbounded data streamsDeveloped based on EMR on ACK and Flink Kubernetes Operator 1.0.1. Uses the Flink Enterprise Edition kernel by default, requiring no additional configuration.

Shuffle Service requirements:

  • Nodes in the dedicated node pool or the associated ACK cluster must belong to big data instance families or instance families with local SSDs. Otherwise, the remote shuffle service fails to deploy.

  • Shuffle Service clusters include a built-in cleanup task named rss-pvc-clean that automatically removes unused PersistentVolumeClaim (PVC) resources, preventing storage consumption by stale data.

Create a cluster

  1. Log on to the EMR console. In the left-side navigation pane, click EMR on ACK.

  2. On the EMR on ACK page, click Create Cluster.

  3. On the E-MapReduce on ACK page, configure the parameters described in the following table.

    Important

    The region cannot be changed after the cluster is created.

    ParameterDescription
    RegionThe region where the cluster is created.
    Cluster typeThe type of the cluster. For details, see Choose a cluster type.
    Product versionThe version of EMR. The latest version is selected by default. Keep the default unless you need a specific version.
    Component versionThe type and version of the component deployed in the cluster of the specified type.
    ACK clusterSelect an existing ACK cluster or create one in the ACK console. To dedicate nodes to EMR workloads, click Configure Dedicated Nodes to add taints and labels to a node pool. If no node pool is available, create a node pool.
    OSS bucketSelect an existing bucket or create one in the OSS console.
    Cluster nameA name for the cluster. Must be 1–64 characters and can contain only letters, digits, hyphens (-), and underscores (_). Example: my-emr-cluster.
  4. Click Create.

Verify the cluster

After you click Create, the cluster enters a provisioning state.

  1. In the left-side navigation pane, click EMR on ACK.

  2. Locate the cluster in the cluster list.

  3. When the cluster status changes to Running, the cluster is ready to use.