Run Spark Jobs on ACK Elastic Container Instances to Cut Costs - Container Service for Kubernetes

Run Apache Spark jobs on Elastic Container Instance (ECI) resources in a Container Service for Kubernetes (ACK) cluster to eliminate idle node costs. By configuring scheduling policies on your SparkApplication, you control whether the driver and executor pods run on Elastic Compute Service (ECS) nodes, ECI nodes, or both—paying only for the CPU and memory each pod consumes while it runs.

Prerequisites

Before you begin, ensure that you have:

ack-spark-operator installed. For details, see Install the ack-spark-operator component.
ack-virtual-node deployed in the cluster. This component exposes ECI capacity as virtual nodes. For details, see Deploy ack-virtual-node in the cluster.
kubectl connected to the ACK cluster. For details, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.

How it works

When ack-virtual-node is deployed, each ECI node carries a virtual-kubelet.io/provider=alibabacloud:NoSchedule taint. Without a matching toleration, the Kubernetes scheduler skips ECI nodes entirely and places all pods on ECS nodes. To route Spark pods to ECI, add tolerations and node affinity rules to the SparkApplication spec.

ECI nodes run each container in a lightweight virtual sandbox, so pods are fully isolated from each other. Because ECI uses pay-as-you-go billing, you're charged only for the CPU and memory each pod consumes while running—not for idle node capacity.

Driver vs. executor placement

The driver and executor pods have different failure characteristics that affect where you should place them:

If an executor fails, Spark automatically starts a replacement. Executors are stateless and fault-tolerant.
If the driver fails, the entire job fails and must restart from the beginning. The driver must be stable.

Because of this asymmetry, keep the driver on reliable ECS nodes and route executors to lower-cost ECI or preemptible instances when cost is a priority.

ECI characteristics for Spark workloads:

Characteristic	Value
Scale	50,000+ pods in an ACK Serverless cluster, no extra configuration needed
Provisioning speed	Thousands of pods within seconds
Billing	Pay-as-you-go; supports preemptible instances for further cost reduction

Choose a scheduling strategy

Three strategies cover the most common deployment scenarios:

Strategy	When to use	Driver placement	Executor placement
ECS only	Predictable, steady workloads	ECS	ECS
ECI only	Batch or burst jobs where full elasticity is needed	ECI	ECI
ECS-first with ECI fallback	Normal load on ECS, auto-expand to ECI during peaks	ECS (preferred)	ECS (preferred), ECI (overflow)

For fine-grained control—such as capping the number of pods per resource type—use a ResourcePolicy.

Schedule Spark jobs on ECS and ECI nodes

All examples use SparkApplication (CRD version sparkoperator.k8s.io/v1beta2) with the following base settings:

Field	Value
Image	`registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2`
`mainClass`	`org.apache.spark.examples.SparkPi`
`sparkVersion`	`3.5.2`
`driver.serviceAccount`	`spark-operator-spark`

ECS only

No toleration or affinity is required. The default taint on ECI nodes prevents the scheduler from placing pods there.

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi-ecs-only
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar
  arguments:
  - "5000"
  sparkVersion: 3.5.2
  driver:
    cores: 1
    coreLimit: 1200m
    memory: 512m
    serviceAccount: spark-operator-spark
  executor:
    instances: 2
    cores: 2
    memory: 4g

Expected behavior: All driver and executor pods are scheduled to ECS nodes. If ECS capacity is insufficient, pods remain in Pending until capacity becomes available.

ECI only

Add a toleration to override the default ECI taint and a requiredDuringSchedulingIgnoredDuringExecution affinity rule to pin pods exclusively to ECI nodes. Apply both to the driver and executor specs.

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi-eci-only
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar
  arguments:
  - "5000"
  sparkVersion: 3.5.2
  driver:
    cores: 1
    coreLimit: 1200m
    memory: 512m
    serviceAccount: spark-operator-spark
    affinity:
      nodeAffinity:
        # Pin driver to ECI nodes only.
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: type
              operator: In
              values:
              - virtual-kubelet
    tolerations:
    # Tolerate the default ECI taint.
    - key: virtual-kubelet.io/provider
      operator: Equal
      value: alibabacloud
      effect: NoSchedule
  executor:
    instances: 2
    cores: 2
    memory: 4g
    affinity:
      nodeAffinity:
        # Pin executors to ECI nodes only.
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: type
              operator: In
              values:
              - virtual-kubelet
    tolerations:
    # Tolerate the default ECI taint.
    - key: virtual-kubelet.io/provider
      operator: Equal
      value: alibabacloud
      effect: NoSchedule

Expected behavior: All pods are scheduled to ECI nodes and the job runs in fully serverless mode. If ECI capacity in the region is temporarily limited, pods remain in Pending until ECI resources are available.

ECS-first with ECI fallback

Use preferredDuringSchedulingIgnoredDuringExecution affinity to favor ECS nodes while still allowing ECI as overflow. Add the ECI toleration so the scheduler can place pods on ECI when ECS is full.

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi-ecs-first
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar
  arguments:
  - "5000"
  sparkVersion: 3.5.2
  driver:
    cores: 1
    coreLimit: 1200m
    memory: 512m
    serviceAccount: spark-operator-spark
    affinity:
      nodeAffinity:
        # Prefer ECS nodes; fall back to ECI if ECS capacity is exhausted.
        preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 1
          preference:
            matchExpressions:
            - key: type
              operator: NotIn
              values:
              - virtual-kubelet
    tolerations:
    # Allow scheduling to ECI nodes when needed.
    - key: virtual-kubelet.io/provider
      operator: Equal
      value: alibabacloud
      effect: NoSchedule
  executor:
    instances: 2
    cores: 2
    memory: 4g
    affinity:
      nodeAffinity:
        # Prefer ECS nodes; fall back to ECI if ECS capacity is exhausted.
        preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 1
          preference:
            matchExpressions:
            - key: type
              operator: NotIn
              values:
              - virtual-kubelet
    tolerations:
    # Allow scheduling to ECI nodes when needed.
    - key: virtual-kubelet.io/provider
      operator: Equal
      value: alibabacloud
      effect: NoSchedule

Expected behavior: Pods land on ECS nodes when capacity is available. When ECS is full, the scheduler automatically places overflow pods on ECI. This prevents job queuing during peak hours without permanently reserving ECI resources.

For more information about taints, tolerations, and node affinity configuration, see Configure resource allocation based on ECS instances and elastic container instances.

Configure priority-based resource scheduling

Use a ResourcePolicy to define explicit scheduling units with per-unit pod caps. The ACK scheduler fills units in priority order during scale-out and removes pods in reverse order during scale-in. For more information, see Configure priority-based resource scheduling.

Create resourcepolicy.yaml with the following content. This ResourcePolicy applies to all pods launched by Spark Operator in the default namespace and defines two scheduling units: up to 2 pods on AMD64 ECS nodes, then up to 3 pods on ECI.

apiVersion: scheduling.alibabacloud.com/v1alpha1
kind: ResourcePolicy
metadata:
  name: sparkapplication-resource-policy
  namespace: default                      # Applies only to pods in this namespace.
spec:
  ignorePreviousPod: true
  ignoreTerminatingPod: false
  matchLabelKeys:
  - sparkoperator.k8s.io/submission-id    # Group pods by Spark job submission ID.
  preemptPolicy: AfterAllUnits            # Attempt preemption only after all units are exhausted.
  selector:
    sparkoperator.k8s.io/launched-by-spark-operator: "true"
  strategy: prefer
  units:
  - max: 2                               # Up to 2 pods on AMD64 ECS nodes (first priority).
    resource: ecs
    nodeSelector:
      kubernetes.io/arch: amd64
  - max: 3                               # Up to 3 pods on ECI (second priority).
    resource: eci

Apply the ResourcePolicy:
```
kubectl apply -f resourcepolicy.yaml
```

Create spark-pi.yaml. This SparkApplication requests 1 driver pod and 5 executor pods, all tolerating the ECI taint so the ResourcePolicy can distribute them across both unit types.

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar
  arguments:
  - "5000"
  sparkVersion: 3.5.2
  driver:
    cores: 1
    coreLimit: 1200m
    memory: 512m
    serviceAccount: spark-operator-spark
    tolerations:
    - key: virtual-kubelet.io/provider    # Allow the driver to be scheduled to ECI if needed.
      operator: Equal
      value: alibabacloud
      effect: NoSchedule
  executor:
    instances: 5
    cores: 1
    coreLimit: 1200m
    memory: 512m
    tolerations:
    - key: virtual-kubelet.io/provider    # Allow executors to be scheduled to ECI.
      operator: Equal
      value: alibabacloud
      effect: NoSchedule

Submit the Spark job:
```
kubectl apply -f spark-pi.yaml
```

Verify the scheduling results:

kubectl get pods -o wide -l sparkoperator.k8s.io/app-name=spark-pi

Expected output:

NAME                                        READY   STATUS      RESTARTS   AGE       IP                  NODE
spark-pi-34c0998f9f832e61-exec-1            1/1     Running     0          28s       192.XXX.XX.34       cn-beijing.192.XXX.XX.250
spark-pi-34c0998f9f832e61-exec-2            1/1     Running     0          28s       192.XXX.XX.87       virtual-kubelet-cn-beijing-i
spark-pi-34c0998f9f832e61-exec-3            1/1     Running     0          28s       192.XXX.XX.88       virtual-kubelet-cn-beijing-i
spark-pi-34c0998f9f832e61-exec-4            1/1     Running     0          28s       192.XXX.XX.86       virtual-kubelet-cn-beijing-i
spark-pi-34c0998f9f832e61-exec-5            0/1     Pending     0          28s       <none>              <none>
spark-pi-driver                             1/1     Running     0          34s       192.XXX.XX.37       cn-beijing.192.XXX.XXX.250

Result: The driver pod and exec-1 are placed on the AMD64 ECS unit (max 2 pods). exec-2, exec-3, and exec-4 are placed on the ECI unit (max 3 pods). exec-5 stays in Pending because both units have reached their pod cap.

Accelerate image pulling with ImageCache

Pulling a large Spark container image on each pod startup adds significant latency. ECI's ImageCache feature pre-caches images on the underlying infrastructure, reducing pod startup time from ~100 seconds to near-instant on a cache hit. For details, see Use ImageCache to accelerate the creation of elastic container instances.

Compare startup with and without an image cache

Without a cache, submitting a SparkApplication and running kubectl describe pod spark-pi-driver shows:

Events:
  ...
  Warning  ImageCacheMissed       24m   EciService         [eci.imagecache]Missed image cache.
  Normal   ImageCacheAutoCreated  24m   EciService         [eci.imagecache]Image cache imc-2zeXXXXXXXXXXXXXXXXX is auto created
  Normal   Pulling                24m   kubelet            Pulling image "registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2"
  Normal   Pulled                 23m   kubelet            Successfully pulled image "registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2" in 1m41.289s (1m41.289s including waiting)
  ...

The image took about 100 seconds to pull. ECI automatically created a cache for future use.

With a cache hit, the events show:

Events:
  ...
  Normal  SuccessfulHitImageCache  23s   EciService         [eci.imagecache]Successfully hit image cache imc-2zeXXXXXXXXXXXXXXXXX, eci will be scheduled with this image cache.
  Normal  Pulled                   4s    kubelet            Container image "registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2" already present on machine
  ...

No image pull was required.

Specify an image cache ID

Add the k8s.aliyun.com/eci-image-snapshot-id annotation to both the driver and executor specs to pin a specific image cache:

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi-eci-only
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar
  arguments:
  - "5000"
  sparkVersion: 3.5.2
  driver:
    annotations:
      k8s.aliyun.com/eci-image-snapshot-id: imc-2zeXXXXXXXXXXXXXXXXX  # Image cache ID.
    cores: 1
    coreLimit: 1200m
    memory: 512m
    serviceAccount: spark-operator-spark
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: type
              operator: In
              values:
              - virtual-kubelet
    tolerations:
    - key: virtual-kubelet.io/provider
      operator: Equal
      value: alibabacloud
      effect: NoSchedule
  executor:
    annotations:
      k8s.aliyun.com/eci-image-snapshot-id: imc-2zeXXXXXXXXXXXXXXXXX  # Image cache ID.
    instances: 2
    cores: 2
    memory: 4g
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: type
              operator: In
              values:
              - virtual-kubelet
    tolerations:
    - key: virtual-kubelet.io/provider
      operator: Equal
      value: alibabacloud
      effect: NoSchedule

Enable automatic image cache creation and matching

To let ECI manage cache creation and matching automatically—without specifying a cache ID—set the k8s.aliyun.com/eci-image-cache annotation to "true" on both the driver and executor:

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi-eci-only
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar
  arguments:
  - "5000"
  sparkVersion: 3.5.2
  driver:
    annotations:
      k8s.aliyun.com/eci-image-cache: "true"  # Enable automatic image cache creation and matching.
    cores: 1
    coreLimit: 1200m
    memory: 512m
    serviceAccount: spark-operator-spark
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: type
              operator: In
              values:
              - virtual-kubelet
    tolerations:
    - key: virtual-kubelet.io/provider
      operator: Equal
      value: alibabacloud
      effect: NoSchedule
  executor:
    annotations:
      k8s.aliyun.com/eci-image-cache: "true"  # Enable automatic image cache creation and matching.
    instances: 2
    cores: 2
    memory: 4g
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: type
              operator: In
              values:
              - virtual-kubelet
    tolerations:
    - key: virtual-kubelet.io/provider
      operator: Equal
      value: alibabacloud
      effect: NoSchedule

On first run, ECI creates a cache automatically. Subsequent runs match the cache and skip the image pull.

Best practices

Use ECS-first with ECI fallback for production jobs. This keeps latency-sensitive jobs on persistent ECS nodes while letting ECI absorb burst traffic without provisioning extra nodes in advance.
Run the driver on ECS, executors on ECI. If the driver is interrupted, the entire job fails and must restart. Keep the driver on stable ECS nodes and route the stateless executors to lower-cost ECI or preemptible instances.
Enable automatic image caching for jobs that run repeatedly. The first run creates the cache; all subsequent runs skip the image pull, cutting pod startup time from ~100 seconds to near-instant.
Use ResourcePolicy when pod distribution across node types must be precise. The max field per unit gives you a hard cap, which is useful when you want to limit ECI spend per job submission.

Container Service for Kubernetes:Use elastic container instances to run Spark jobs