All Products
Search
Document Center

E-MapReduce:Get started with EMR on ACK

Last Updated:Mar 26, 2026

Run your first Spark job on E-MapReduce (EMR) on Container Service for Kubernetes (ACK). This guide walks you through assigning the required role, creating a Spark cluster, and submitting a Spark job using a custom resource definition (CRD).

This guide uses a JAR file pre-packaged in the EMR image. To use your own JAR file, upload it to Object Storage Service (OSS) and replace local:///opt/spark/examples/spark-examples.jar in the job manifest with your OSS path: oss://<yourBucketName>/<path>.jar. For upload instructions, see Simple upload.

Prerequisites

Before you begin, ensure that you have:

  • An ACK cluster (dedicated or managed). See Create an ACK dedicated cluster or Create an ACK managed cluster

  • The AliyunOSSFullAccess and AliyunDLFFullAccess policies attached to your Alibaba Cloud account. AliyunOSSFullAccess allows EMR to read and write job artifacts in OSS; AliyunDLFFullAccess allows EMR to interact with Data Lake Formation metadata. See Attach policies to a RAM role

  • kubectl configured to connect to your ACK cluster. Note your cluster namespace — you need it when submitting the job

  • (Optional) OSS activated, if you plan to store JAR files in OSS. See Activate OSS

Before you start, collect the following values:

Value Where to find it
ACK cluster name ACK console, cluster list
Cluster namespace EMR console > Cluster Details tab, after cluster creation
OSS bucket name OSS console, bucket list

Step 1: Assign a role

Assign the system default role AliyunEMROnACKDefaultRole to your Alibaba Cloud account. This role grants EMR on ACK the permissions it needs to manage compute resources in your ACK cluster. For instructions, see Assign a role to an Alibaba Cloud account.

Step 2: Create a cluster

Create a Spark cluster on the EMR on ACK page. For full parameter reference, see Create a cluster.

  1. Log on to the EMR console. In the left-side navigation pane, click EMR on ACK.

  2. Click Create Cluster.

  3. On the E-MapReduce on ACK page, configure the following parameters.

    To associate a Spark cluster with a Shuffle Service cluster, both must share the same major EMR version (for example, EMR-5.x-ack with EMR-5.x-ack).
    To configure dedicated nodes for EMR workloads, click Configure Dedicated Nodes. Configure taints and labels on a node pool rather than individual nodes. If no node pool exists, create one first. See Create a node pool and Node pool overview.
    Parameter Example Description
    Region China (Hangzhou) The region where the cluster is created. Cannot be changed after creation.
    Cluster Type Spark The compute framework. Spark supports extract, transform, and load (ETL), batch processing, and data modeling.
    Product Version EMR-5.6.0-ack The EMR version. Defaults to the latest version.
    Component Version SPARK (3.2.1) The component type and version deployed in the cluster.
    ACK Cluster Emr-ack Select an existing ACK cluster. The same ACK cluster cannot be associated with multiple clusters of the same type.
    OSS Bucket oss-spark-test Select an existing bucket or create one in the OSS console.
    Cluster Name Emr-Spark 1–64 characters. Allowed: letters, digits, hyphens (-), and underscores (_).
  4. Click Create. The cluster is ready when its status changes to Running.

Step 3: Submit a job

Submit a Spark job to the cluster using a SparkApplication CRD manifest. For other job types, see Submit a Spark job, Use the CLI to submit a Presto job, and Submit a Flink job.

  1. Connect to your ACK cluster using kubectl. See Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.

  2. Create a file named spark-pi.yaml with the following content.

    apiVersion: "sparkoperator.k8s.io/v1beta2"
    kind: SparkApplication
    metadata:
      name: spark-pi-simple
    spec:
      type: Scala
      sparkVersion: 3.2.1
      mainClass: org.apache.spark.examples.SparkPi
      mainApplicationFile: "local:///opt/spark/examples/spark-examples.jar"
      arguments:
        - "1000"
      driver:
        cores: 1
        coreLimit: 1000m
        memory: 4g
      executor:
        cores: 1
        coreLimit: 1000m
        memory: 8g
        memoryOverhead: 1g
        instances: 1

    This example uses Spark 3.2.1 for EMR V5.6.0. Adjust sparkVersion if you use a different version. For a full field reference, see spark-on-k8s-operator API docs.

  3. Submit the job.

    kubectl apply -f spark-pi.yaml --namespace <namespace>

    Replace <namespace> with the namespace of your EMR cluster. To find it, log on to the EMR console and go to the Cluster Details tab. A successful submission returns:

    sparkapplication.sparkoperator.k8s.io/spark-pi-simple created

    spark-pi-simple is the name of the submitted Spark job.

  4. Optional. View the information about the submitted Spark job on the Job Details tab.

Step 4: (Optional) Release the cluster

Release the cluster when you no longer need it to avoid unnecessary charges.

  1. On the EMR on ACK page, find the cluster and click Release in the Actions column.

  2. In the Release Cluster dialog, click OK.

What's next

  • View cluster information — check all clusters in your account.

  • View jobs — monitor and manage jobs running in your cluster.

  • Quick start — learn more about Spark self-contained applications (select your language and Spark version).