All Products
Search
Document Center

Container Service for Kubernetes:Use elastic container instances to run Spark jobs

Last Updated:Mar 26, 2026

Run Apache Spark jobs on Elastic Container Instance (ECI) resources in a Container Service for Kubernetes (ACK) cluster to eliminate idle node costs. By configuring scheduling policies on your SparkApplication, you control whether the driver and executor pods run on Elastic Compute Service (ECS) nodes, ECI nodes, or both—paying only for the CPU and memory each pod consumes while it runs.

Prerequisites

Before you begin, ensure that you have:

How it works

When ack-virtual-node is deployed, each ECI node carries a virtual-kubelet.io/provider=alibabacloud:NoSchedule taint. Without a matching toleration, the Kubernetes scheduler skips ECI nodes entirely and places all pods on ECS nodes. To route Spark pods to ECI, add tolerations and node affinity rules to the SparkApplication spec.

ECI nodes run each container in a lightweight virtual sandbox, so pods are fully isolated from each other. Because ECI uses pay-as-you-go billing, you're charged only for the CPU and memory each pod consumes while running—not for idle node capacity.

Driver vs. executor placement

The driver and executor pods have different failure characteristics that affect where you should place them:

  • If an executor fails, Spark automatically starts a replacement. Executors are stateless and fault-tolerant.

  • If the driver fails, the entire job fails and must restart from the beginning. The driver must be stable.

Because of this asymmetry, keep the driver on reliable ECS nodes and route executors to lower-cost ECI or preemptible instances when cost is a priority.

ECI characteristics for Spark workloads:

Characteristic Value
Scale 50,000+ pods in an ACK Serverless cluster, no extra configuration needed
Provisioning speed Thousands of pods within seconds
Billing Pay-as-you-go; supports preemptible instances for further cost reduction

Choose a scheduling strategy

Three strategies cover the most common deployment scenarios:

Strategy When to use Driver placement Executor placement
ECS only Predictable, steady workloads ECS ECS
ECI only Batch or burst jobs where full elasticity is needed ECI ECI
ECS-first with ECI fallback Normal load on ECS, auto-expand to ECI during peaks ECS (preferred) ECS (preferred), ECI (overflow)

For fine-grained control—such as capping the number of pods per resource type—use a ResourcePolicy.

Schedule Spark jobs on ECS and ECI nodes

All examples use SparkApplication (CRD version sparkoperator.k8s.io/v1beta2) with the following base settings:

Field Value
Image registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2
mainClass org.apache.spark.examples.SparkPi
sparkVersion 3.5.2
driver.serviceAccount spark-operator-spark

ECS only

No toleration or affinity is required. The default taint on ECI nodes prevents the scheduler from placing pods there.

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi-ecs-only
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar
  arguments:
  - "5000"
  sparkVersion: 3.5.2
  driver:
    cores: 1
    coreLimit: 1200m
    memory: 512m
    serviceAccount: spark-operator-spark
  executor:
    instances: 2
    cores: 2
    memory: 4g

Expected behavior: All driver and executor pods are scheduled to ECS nodes. If ECS capacity is insufficient, pods remain in Pending until capacity becomes available.

ECI only

Add a toleration to override the default ECI taint and a requiredDuringSchedulingIgnoredDuringExecution affinity rule to pin pods exclusively to ECI nodes. Apply both to the driver and executor specs.

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi-eci-only
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar
  arguments:
  - "5000"
  sparkVersion: 3.5.2
  driver:
    cores: 1
    coreLimit: 1200m
    memory: 512m
    serviceAccount: spark-operator-spark
    affinity:
      nodeAffinity:
        # Pin driver to ECI nodes only.
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: type
              operator: In
              values:
              - virtual-kubelet
    tolerations:
    # Tolerate the default ECI taint.
    - key: virtual-kubelet.io/provider
      operator: Equal
      value: alibabacloud
      effect: NoSchedule
  executor:
    instances: 2
    cores: 2
    memory: 4g
    affinity:
      nodeAffinity:
        # Pin executors to ECI nodes only.
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: type
              operator: In
              values:
              - virtual-kubelet
    tolerations:
    # Tolerate the default ECI taint.
    - key: virtual-kubelet.io/provider
      operator: Equal
      value: alibabacloud
      effect: NoSchedule

Expected behavior: All pods are scheduled to ECI nodes and the job runs in fully serverless mode. If ECI capacity in the region is temporarily limited, pods remain in Pending until ECI resources are available.

ECS-first with ECI fallback

Use preferredDuringSchedulingIgnoredDuringExecution affinity to favor ECS nodes while still allowing ECI as overflow. Add the ECI toleration so the scheduler can place pods on ECI when ECS is full.

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi-ecs-first
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar
  arguments:
  - "5000"
  sparkVersion: 3.5.2
  driver:
    cores: 1
    coreLimit: 1200m
    memory: 512m
    serviceAccount: spark-operator-spark
    affinity:
      nodeAffinity:
        # Prefer ECS nodes; fall back to ECI if ECS capacity is exhausted.
        preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 1
          preference:
            matchExpressions:
            - key: type
              operator: NotIn
              values:
              - virtual-kubelet
    tolerations:
    # Allow scheduling to ECI nodes when needed.
    - key: virtual-kubelet.io/provider
      operator: Equal
      value: alibabacloud
      effect: NoSchedule
  executor:
    instances: 2
    cores: 2
    memory: 4g
    affinity:
      nodeAffinity:
        # Prefer ECS nodes; fall back to ECI if ECS capacity is exhausted.
        preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 1
          preference:
            matchExpressions:
            - key: type
              operator: NotIn
              values:
              - virtual-kubelet
    tolerations:
    # Allow scheduling to ECI nodes when needed.
    - key: virtual-kubelet.io/provider
      operator: Equal
      value: alibabacloud
      effect: NoSchedule

Expected behavior: Pods land on ECS nodes when capacity is available. When ECS is full, the scheduler automatically places overflow pods on ECI. This prevents job queuing during peak hours without permanently reserving ECI resources.

For more information about taints, tolerations, and node affinity configuration, see Configure resource allocation based on ECS instances and elastic container instances.

Configure priority-based resource scheduling

Use a ResourcePolicy to define explicit scheduling units with per-unit pod caps. The ACK scheduler fills units in priority order during scale-out and removes pods in reverse order during scale-in. For more information, see Configure priority-based resource scheduling.

  1. Create resourcepolicy.yaml with the following content. This ResourcePolicy applies to all pods launched by Spark Operator in the default namespace and defines two scheduling units: up to 2 pods on AMD64 ECS nodes, then up to 3 pods on ECI.

    apiVersion: scheduling.alibabacloud.com/v1alpha1
    kind: ResourcePolicy
    metadata:
      name: sparkapplication-resource-policy
      namespace: default                      # Applies only to pods in this namespace.
    spec:
      ignorePreviousPod: true
      ignoreTerminatingPod: false
      matchLabelKeys:
      - sparkoperator.k8s.io/submission-id    # Group pods by Spark job submission ID.
      preemptPolicy: AfterAllUnits            # Attempt preemption only after all units are exhausted.
      selector:
        sparkoperator.k8s.io/launched-by-spark-operator: "true"
      strategy: prefer
      units:
      - max: 2                               # Up to 2 pods on AMD64 ECS nodes (first priority).
        resource: ecs
        nodeSelector:
          kubernetes.io/arch: amd64
      - max: 3                               # Up to 3 pods on ECI (second priority).
        resource: eci
  2. Apply the ResourcePolicy:

    kubectl apply -f resourcepolicy.yaml
  3. Create spark-pi.yaml. This SparkApplication requests 1 driver pod and 5 executor pods, all tolerating the ECI taint so the ResourcePolicy can distribute them across both unit types.

    apiVersion: sparkoperator.k8s.io/v1beta2
    kind: SparkApplication
    metadata:
      name: spark-pi
      namespace: default
    spec:
      type: Scala
      mode: cluster
      image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2
      mainClass: org.apache.spark.examples.SparkPi
      mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar
      arguments:
      - "5000"
      sparkVersion: 3.5.2
      driver:
        cores: 1
        coreLimit: 1200m
        memory: 512m
        serviceAccount: spark-operator-spark
        tolerations:
        - key: virtual-kubelet.io/provider    # Allow the driver to be scheduled to ECI if needed.
          operator: Equal
          value: alibabacloud
          effect: NoSchedule
      executor:
        instances: 5
        cores: 1
        coreLimit: 1200m
        memory: 512m
        tolerations:
        - key: virtual-kubelet.io/provider    # Allow executors to be scheduled to ECI.
          operator: Equal
          value: alibabacloud
          effect: NoSchedule
  4. Submit the Spark job:

    kubectl apply -f spark-pi.yaml
  5. Verify the scheduling results:

    kubectl get pods -o wide -l sparkoperator.k8s.io/app-name=spark-pi

    Expected output:

    NAME                                        READY   STATUS      RESTARTS   AGE       IP                  NODE
    spark-pi-34c0998f9f832e61-exec-1            1/1     Running     0          28s       192.XXX.XX.34       cn-beijing.192.XXX.XX.250
    spark-pi-34c0998f9f832e61-exec-2            1/1     Running     0          28s       192.XXX.XX.87       virtual-kubelet-cn-beijing-i
    spark-pi-34c0998f9f832e61-exec-3            1/1     Running     0          28s       192.XXX.XX.88       virtual-kubelet-cn-beijing-i
    spark-pi-34c0998f9f832e61-exec-4            1/1     Running     0          28s       192.XXX.XX.86       virtual-kubelet-cn-beijing-i
    spark-pi-34c0998f9f832e61-exec-5            0/1     Pending     0          28s       <none>              <none>
    spark-pi-driver                             1/1     Running     0          34s       192.XXX.XX.37       cn-beijing.192.XXX.XXX.250

    Result: The driver pod and exec-1 are placed on the AMD64 ECS unit (max 2 pods). exec-2, exec-3, and exec-4 are placed on the ECI unit (max 3 pods). exec-5 stays in Pending because both units have reached their pod cap.

Accelerate image pulling with ImageCache

Pulling a large Spark container image on each pod startup adds significant latency. ECI's ImageCache feature pre-caches images on the underlying infrastructure, reducing pod startup time from ~100 seconds to near-instant on a cache hit. For details, see Use ImageCache to accelerate the creation of elastic container instances.

Compare startup with and without an image cache

Without a cache, submitting a SparkApplication and running kubectl describe pod spark-pi-driver shows:

Events:
  ...
  Warning  ImageCacheMissed       24m   EciService         [eci.imagecache]Missed image cache.
  Normal   ImageCacheAutoCreated  24m   EciService         [eci.imagecache]Image cache imc-2zeXXXXXXXXXXXXXXXXX is auto created
  Normal   Pulling                24m   kubelet            Pulling image "registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2"
  Normal   Pulled                 23m   kubelet            Successfully pulled image "registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2" in 1m41.289s (1m41.289s including waiting)
  ...

The image took about 100 seconds to pull. ECI automatically created a cache for future use.

With a cache hit, the events show:

Events:
  ...
  Normal  SuccessfulHitImageCache  23s   EciService         [eci.imagecache]Successfully hit image cache imc-2zeXXXXXXXXXXXXXXXXX, eci will be scheduled with this image cache.
  Normal  Pulled                   4s    kubelet            Container image "registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2" already present on machine
  ...

No image pull was required.

Specify an image cache ID

Add the k8s.aliyun.com/eci-image-snapshot-id annotation to both the driver and executor specs to pin a specific image cache:

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi-eci-only
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar
  arguments:
  - "5000"
  sparkVersion: 3.5.2
  driver:
    annotations:
      k8s.aliyun.com/eci-image-snapshot-id: imc-2zeXXXXXXXXXXXXXXXXX  # Image cache ID.
    cores: 1
    coreLimit: 1200m
    memory: 512m
    serviceAccount: spark-operator-spark
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: type
              operator: In
              values:
              - virtual-kubelet
    tolerations:
    - key: virtual-kubelet.io/provider
      operator: Equal
      value: alibabacloud
      effect: NoSchedule
  executor:
    annotations:
      k8s.aliyun.com/eci-image-snapshot-id: imc-2zeXXXXXXXXXXXXXXXXX  # Image cache ID.
    instances: 2
    cores: 2
    memory: 4g
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: type
              operator: In
              values:
              - virtual-kubelet
    tolerations:
    - key: virtual-kubelet.io/provider
      operator: Equal
      value: alibabacloud
      effect: NoSchedule

Enable automatic image cache creation and matching

To let ECI manage cache creation and matching automatically—without specifying a cache ID—set the k8s.aliyun.com/eci-image-cache annotation to "true" on both the driver and executor:

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi-eci-only
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar
  arguments:
  - "5000"
  sparkVersion: 3.5.2
  driver:
    annotations:
      k8s.aliyun.com/eci-image-cache: "true"  # Enable automatic image cache creation and matching.
    cores: 1
    coreLimit: 1200m
    memory: 512m
    serviceAccount: spark-operator-spark
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: type
              operator: In
              values:
              - virtual-kubelet
    tolerations:
    - key: virtual-kubelet.io/provider
      operator: Equal
      value: alibabacloud
      effect: NoSchedule
  executor:
    annotations:
      k8s.aliyun.com/eci-image-cache: "true"  # Enable automatic image cache creation and matching.
    instances: 2
    cores: 2
    memory: 4g
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: type
              operator: In
              values:
              - virtual-kubelet
    tolerations:
    - key: virtual-kubelet.io/provider
      operator: Equal
      value: alibabacloud
      effect: NoSchedule

On first run, ECI creates a cache automatically. Subsequent runs match the cache and skip the image pull.

Best practices

  • Use ECS-first with ECI fallback for production jobs. This keeps latency-sensitive jobs on persistent ECS nodes while letting ECI absorb burst traffic without provisioning extra nodes in advance.

  • Run the driver on ECS, executors on ECI. If the driver is interrupted, the entire job fails and must restart. Keep the driver on stable ECS nodes and route the stateless executors to lower-cost ECI or preemptible instances.

  • Enable automatic image caching for jobs that run repeatedly. The first run creates the cache; all subsequent runs skip the image pull, cutting pod startup time from ~100 seconds to near-instant.

  • Use ResourcePolicy when pod distribution across node types must be precise. The max field per unit gives you a hard cap, which is useful when you want to limit ECI spend per job submission.

What's next