Run Apache Spark jobs on Elastic Container Instance (ECI) resources in a Container Service for Kubernetes (ACK) cluster to eliminate idle node costs. By configuring scheduling policies on your SparkApplication, you control whether the driver and executor pods run on Elastic Compute Service (ECS) nodes, ECI nodes, or both—paying only for the CPU and memory each pod consumes while it runs.
Prerequisites
Before you begin, ensure that you have:
-
ack-spark-operator installed. For details, see Install the ack-spark-operator component.
-
ack-virtual-node deployed in the cluster. This component exposes ECI capacity as virtual nodes. For details, see Deploy ack-virtual-node in the cluster.
-
kubectl connected to the ACK cluster. For details, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.
How it works
When ack-virtual-node is deployed, each ECI node carries a virtual-kubelet.io/provider=alibabacloud:NoSchedule taint. Without a matching toleration, the Kubernetes scheduler skips ECI nodes entirely and places all pods on ECS nodes. To route Spark pods to ECI, add tolerations and node affinity rules to the SparkApplication spec.
ECI nodes run each container in a lightweight virtual sandbox, so pods are fully isolated from each other. Because ECI uses pay-as-you-go billing, you're charged only for the CPU and memory each pod consumes while running—not for idle node capacity.
Driver vs. executor placement
The driver and executor pods have different failure characteristics that affect where you should place them:
-
If an executor fails, Spark automatically starts a replacement. Executors are stateless and fault-tolerant.
-
If the driver fails, the entire job fails and must restart from the beginning. The driver must be stable.
Because of this asymmetry, keep the driver on reliable ECS nodes and route executors to lower-cost ECI or preemptible instances when cost is a priority.
ECI characteristics for Spark workloads:
| Characteristic | Value |
|---|---|
| Scale | 50,000+ pods in an ACK Serverless cluster, no extra configuration needed |
| Provisioning speed | Thousands of pods within seconds |
| Billing | Pay-as-you-go; supports preemptible instances for further cost reduction |
Choose a scheduling strategy
Three strategies cover the most common deployment scenarios:
| Strategy | When to use | Driver placement | Executor placement |
|---|---|---|---|
| ECS only | Predictable, steady workloads | ECS | ECS |
| ECI only | Batch or burst jobs where full elasticity is needed | ECI | ECI |
| ECS-first with ECI fallback | Normal load on ECS, auto-expand to ECI during peaks | ECS (preferred) | ECS (preferred), ECI (overflow) |
For fine-grained control—such as capping the number of pods per resource type—use a ResourcePolicy.
Schedule Spark jobs on ECS and ECI nodes
All examples use SparkApplication (CRD version sparkoperator.k8s.io/v1beta2) with the following base settings:
| Field | Value |
|---|---|
| Image | registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2 |
mainClass |
org.apache.spark.examples.SparkPi |
sparkVersion |
3.5.2 |
driver.serviceAccount |
spark-operator-spark |
ECS only
No toleration or affinity is required. The default taint on ECI nodes prevents the scheduler from placing pods there.
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
name: spark-pi-ecs-only
namespace: default
spec:
type: Scala
mode: cluster
image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar
arguments:
- "5000"
sparkVersion: 3.5.2
driver:
cores: 1
coreLimit: 1200m
memory: 512m
serviceAccount: spark-operator-spark
executor:
instances: 2
cores: 2
memory: 4g
Expected behavior: All driver and executor pods are scheduled to ECS nodes. If ECS capacity is insufficient, pods remain in Pending until capacity becomes available.
ECI only
Add a toleration to override the default ECI taint and a requiredDuringSchedulingIgnoredDuringExecution affinity rule to pin pods exclusively to ECI nodes. Apply both to the driver and executor specs.
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
name: spark-pi-eci-only
namespace: default
spec:
type: Scala
mode: cluster
image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar
arguments:
- "5000"
sparkVersion: 3.5.2
driver:
cores: 1
coreLimit: 1200m
memory: 512m
serviceAccount: spark-operator-spark
affinity:
nodeAffinity:
# Pin driver to ECI nodes only.
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: type
operator: In
values:
- virtual-kubelet
tolerations:
# Tolerate the default ECI taint.
- key: virtual-kubelet.io/provider
operator: Equal
value: alibabacloud
effect: NoSchedule
executor:
instances: 2
cores: 2
memory: 4g
affinity:
nodeAffinity:
# Pin executors to ECI nodes only.
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: type
operator: In
values:
- virtual-kubelet
tolerations:
# Tolerate the default ECI taint.
- key: virtual-kubelet.io/provider
operator: Equal
value: alibabacloud
effect: NoSchedule
Expected behavior: All pods are scheduled to ECI nodes and the job runs in fully serverless mode. If ECI capacity in the region is temporarily limited, pods remain in Pending until ECI resources are available.
ECS-first with ECI fallback
Use preferredDuringSchedulingIgnoredDuringExecution affinity to favor ECS nodes while still allowing ECI as overflow. Add the ECI toleration so the scheduler can place pods on ECI when ECS is full.
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
name: spark-pi-ecs-first
namespace: default
spec:
type: Scala
mode: cluster
image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar
arguments:
- "5000"
sparkVersion: 3.5.2
driver:
cores: 1
coreLimit: 1200m
memory: 512m
serviceAccount: spark-operator-spark
affinity:
nodeAffinity:
# Prefer ECS nodes; fall back to ECI if ECS capacity is exhausted.
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: type
operator: NotIn
values:
- virtual-kubelet
tolerations:
# Allow scheduling to ECI nodes when needed.
- key: virtual-kubelet.io/provider
operator: Equal
value: alibabacloud
effect: NoSchedule
executor:
instances: 2
cores: 2
memory: 4g
affinity:
nodeAffinity:
# Prefer ECS nodes; fall back to ECI if ECS capacity is exhausted.
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: type
operator: NotIn
values:
- virtual-kubelet
tolerations:
# Allow scheduling to ECI nodes when needed.
- key: virtual-kubelet.io/provider
operator: Equal
value: alibabacloud
effect: NoSchedule
Expected behavior: Pods land on ECS nodes when capacity is available. When ECS is full, the scheduler automatically places overflow pods on ECI. This prevents job queuing during peak hours without permanently reserving ECI resources.
For more information about taints, tolerations, and node affinity configuration, see Configure resource allocation based on ECS instances and elastic container instances.
Configure priority-based resource scheduling
Use a ResourcePolicy to define explicit scheduling units with per-unit pod caps. The ACK scheduler fills units in priority order during scale-out and removes pods in reverse order during scale-in. For more information, see Configure priority-based resource scheduling.
-
Create
resourcepolicy.yamlwith the following content. This ResourcePolicy applies to all pods launched by Spark Operator in thedefaultnamespace and defines two scheduling units: up to 2 pods on AMD64 ECS nodes, then up to 3 pods on ECI.apiVersion: scheduling.alibabacloud.com/v1alpha1 kind: ResourcePolicy metadata: name: sparkapplication-resource-policy namespace: default # Applies only to pods in this namespace. spec: ignorePreviousPod: true ignoreTerminatingPod: false matchLabelKeys: - sparkoperator.k8s.io/submission-id # Group pods by Spark job submission ID. preemptPolicy: AfterAllUnits # Attempt preemption only after all units are exhausted. selector: sparkoperator.k8s.io/launched-by-spark-operator: "true" strategy: prefer units: - max: 2 # Up to 2 pods on AMD64 ECS nodes (first priority). resource: ecs nodeSelector: kubernetes.io/arch: amd64 - max: 3 # Up to 3 pods on ECI (second priority). resource: eci -
Apply the ResourcePolicy:
kubectl apply -f resourcepolicy.yaml -
Create
spark-pi.yaml. This SparkApplication requests 1 driver pod and 5 executor pods, all tolerating the ECI taint so the ResourcePolicy can distribute them across both unit types.apiVersion: sparkoperator.k8s.io/v1beta2 kind: SparkApplication metadata: name: spark-pi namespace: default spec: type: Scala mode: cluster image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2 mainClass: org.apache.spark.examples.SparkPi mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar arguments: - "5000" sparkVersion: 3.5.2 driver: cores: 1 coreLimit: 1200m memory: 512m serviceAccount: spark-operator-spark tolerations: - key: virtual-kubelet.io/provider # Allow the driver to be scheduled to ECI if needed. operator: Equal value: alibabacloud effect: NoSchedule executor: instances: 5 cores: 1 coreLimit: 1200m memory: 512m tolerations: - key: virtual-kubelet.io/provider # Allow executors to be scheduled to ECI. operator: Equal value: alibabacloud effect: NoSchedule -
Submit the Spark job:
kubectl apply -f spark-pi.yaml -
Verify the scheduling results:
kubectl get pods -o wide -l sparkoperator.k8s.io/app-name=spark-piExpected output:
NAME READY STATUS RESTARTS AGE IP NODE spark-pi-34c0998f9f832e61-exec-1 1/1 Running 0 28s 192.XXX.XX.34 cn-beijing.192.XXX.XX.250 spark-pi-34c0998f9f832e61-exec-2 1/1 Running 0 28s 192.XXX.XX.87 virtual-kubelet-cn-beijing-i spark-pi-34c0998f9f832e61-exec-3 1/1 Running 0 28s 192.XXX.XX.88 virtual-kubelet-cn-beijing-i spark-pi-34c0998f9f832e61-exec-4 1/1 Running 0 28s 192.XXX.XX.86 virtual-kubelet-cn-beijing-i spark-pi-34c0998f9f832e61-exec-5 0/1 Pending 0 28s <none> <none> spark-pi-driver 1/1 Running 0 34s 192.XXX.XX.37 cn-beijing.192.XXX.XXX.250Result: The driver pod and exec-1 are placed on the AMD64 ECS unit (max 2 pods). exec-2, exec-3, and exec-4 are placed on the ECI unit (max 3 pods). exec-5 stays in Pending because both units have reached their pod cap.
Accelerate image pulling with ImageCache
Pulling a large Spark container image on each pod startup adds significant latency. ECI's ImageCache feature pre-caches images on the underlying infrastructure, reducing pod startup time from ~100 seconds to near-instant on a cache hit. For details, see Use ImageCache to accelerate the creation of elastic container instances.
Compare startup with and without an image cache
Without a cache, submitting a SparkApplication and running kubectl describe pod spark-pi-driver shows:
Events:
...
Warning ImageCacheMissed 24m EciService [eci.imagecache]Missed image cache.
Normal ImageCacheAutoCreated 24m EciService [eci.imagecache]Image cache imc-2zeXXXXXXXXXXXXXXXXX is auto created
Normal Pulling 24m kubelet Pulling image "registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2"
Normal Pulled 23m kubelet Successfully pulled image "registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2" in 1m41.289s (1m41.289s including waiting)
...
The image took about 100 seconds to pull. ECI automatically created a cache for future use.
With a cache hit, the events show:
Events:
...
Normal SuccessfulHitImageCache 23s EciService [eci.imagecache]Successfully hit image cache imc-2zeXXXXXXXXXXXXXXXXX, eci will be scheduled with this image cache.
Normal Pulled 4s kubelet Container image "registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2" already present on machine
...
No image pull was required.
Specify an image cache ID
Add the k8s.aliyun.com/eci-image-snapshot-id annotation to both the driver and executor specs to pin a specific image cache:
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
name: spark-pi-eci-only
namespace: default
spec:
type: Scala
mode: cluster
image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar
arguments:
- "5000"
sparkVersion: 3.5.2
driver:
annotations:
k8s.aliyun.com/eci-image-snapshot-id: imc-2zeXXXXXXXXXXXXXXXXX # Image cache ID.
cores: 1
coreLimit: 1200m
memory: 512m
serviceAccount: spark-operator-spark
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: type
operator: In
values:
- virtual-kubelet
tolerations:
- key: virtual-kubelet.io/provider
operator: Equal
value: alibabacloud
effect: NoSchedule
executor:
annotations:
k8s.aliyun.com/eci-image-snapshot-id: imc-2zeXXXXXXXXXXXXXXXXX # Image cache ID.
instances: 2
cores: 2
memory: 4g
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: type
operator: In
values:
- virtual-kubelet
tolerations:
- key: virtual-kubelet.io/provider
operator: Equal
value: alibabacloud
effect: NoSchedule
Enable automatic image cache creation and matching
To let ECI manage cache creation and matching automatically—without specifying a cache ID—set the k8s.aliyun.com/eci-image-cache annotation to "true" on both the driver and executor:
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
name: spark-pi-eci-only
namespace: default
spec:
type: Scala
mode: cluster
image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar
arguments:
- "5000"
sparkVersion: 3.5.2
driver:
annotations:
k8s.aliyun.com/eci-image-cache: "true" # Enable automatic image cache creation and matching.
cores: 1
coreLimit: 1200m
memory: 512m
serviceAccount: spark-operator-spark
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: type
operator: In
values:
- virtual-kubelet
tolerations:
- key: virtual-kubelet.io/provider
operator: Equal
value: alibabacloud
effect: NoSchedule
executor:
annotations:
k8s.aliyun.com/eci-image-cache: "true" # Enable automatic image cache creation and matching.
instances: 2
cores: 2
memory: 4g
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: type
operator: In
values:
- virtual-kubelet
tolerations:
- key: virtual-kubelet.io/provider
operator: Equal
value: alibabacloud
effect: NoSchedule
On first run, ECI creates a cache automatically. Subsequent runs match the cache and skip the image pull.
Best practices
-
Use ECS-first with ECI fallback for production jobs. This keeps latency-sensitive jobs on persistent ECS nodes while letting ECI absorb burst traffic without provisioning extra nodes in advance.
-
Run the driver on ECS, executors on ECI. If the driver is interrupted, the entire job fails and must restart. Keep the driver on stable ECS nodes and route the stateless executors to lower-cost ECI or preemptible instances.
-
Enable automatic image caching for jobs that run repeatedly. The first run creates the cache; all subsequent runs skip the image pull, cutting pod startup time from ~100 seconds to near-instant.
-
Use ResourcePolicy when pod distribution across node types must be precise. The
maxfield per unit gives you a hard cap, which is useful when you want to limit ECI spend per job submission.