All Products
Search
Document Center

Container Service for Kubernetes:Multi-cluster Spark job scheduling and distribution based on actual available resources

Last Updated:Mar 26, 2026

If your ACK clusters run online services and you want to use their idle resources for batch Spark workloads without affecting those services, this guide shows you how. Using an ACK One Fleet instance with ACK Koordinator colocation and ACK Spark Operator, you can schedule and distribute a Spark job across multiple clusters, tapping idle capacity while keeping online services protected through priority isolation.

How it works

Multi-cluster Spark job scheduling relies on three components working together:

  • ACK One Fleet instance — provides the Global Scheduler, which distributes SparkApplication resources to member clusters based on available idle capacity.

  • ACK Koordinator — enables colocation in each member cluster, letting Spark pods use resources that online workloads are not currently consuming.

  • ACK Spark Operator — runs the Spark driver and executor pods inside each member cluster.

image

Why priority isolation matters

Online and offline workloads have fundamentally different resource profiles:

Online workload Spark batch workload
Typical apps Microservices, APIs, recommendation systems Data processing, analytics, AI training
Latency sensitivity High Low
SLO Strict Flexible
Fault tolerance Low — high availability required Allows failure and retry

Assigning Spark jobs a lower PriorityClass value (negative) than online services ensures the scheduler always favors online pods when resources are constrained.

Scheduling flow

  1. Associate multiple ACK clusters with a Fleet instance; deploy ACK Koordinator and ACK Spark Operator in each.

  2. Create a SparkApplication and a PropagationPolicy on the Fleet instance.

  3. The Global Scheduler matches Spark job resource requests against the remaining capacity of each member cluster.

    For member clusters running Kubernetes 1.28 or later, the Fleet instance supports resource preoccupation to improve scheduling success rates.
  4. The Fleet instance distributes SparkApplication to a matching member cluster.

  5. ACK Spark Operator runs the driver and executor pods. The Fleet instance monitors pod status. If the driver cannot start due to insufficient resources, the Fleet instance reclaims the SparkApplication after a timeout and reschedules it to another member cluster with enough capacity.

Prerequisites

Before you begin, make sure you have:

  • A Fleet instance associated with multiple clusters running Kubernetes 1.18 or later (Manage associated clusters)

  • The AliyunAdcpFullAccess permission granted to your Resource Access Management (RAM) user (Grant permissions to a RAM user)

  • The AMC command-line tool installed

  • ack-koordinator (formerly ack-slo-manager) installed in each member cluster

Important

Install ack-spark-operator version 2.1.2 or later in each member cluster where you want to run Spark jobs (see Step 3).

Step 1: Configure ack-koordinator in each member cluster

Enable colocation in each member cluster by creating the ack-slo-config ConfigMap in the kube-system namespace.

  1. Log in to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, click the name of the cluster you want to configure. In the left navigation pane, choose Configurations > ConfigMaps.

  3. On the ConfigMap page, click Create from YAML. Copy the following template into the editor. For details on each setting, see Get started with colocation.

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: ack-slo-config
      namespace: kube-system
    data:
      colocation-config: |-
        {
          "enable": true
        }
      resource-qos-config: |-
        {
          "clusterStrategy": {
            "lsClass": {
              "cpuQOS": {
                "enable": true
              },
              "memoryQOS": {
                "enable": true
              },
              "resctrlQOS": {
                "enable": true
              }
            },
            "beClass": {
              "cpuQOS": {
                "enable": true
              },
              "memoryQOS": {
                "enable": true
              },
              "resctrlQOS": {
                "enable": true
              }
            }
          }
        }
      resource-threshold-config: |-
        {
          "clusterStrategy": {
            "enable": true
          }
        }

    The ConfigMap configures two Quality of Service (QoS) classes:

    • `lsClass` (Latency-Sensitive) — for online services. CPU, memory, and hardware cache (resctrl) QoS controls protect latency-sensitive workloads.

    • `beClass` (Best Effort) — for Spark batch jobs. The same controls throttle Spark pods when online services need resources back.

    Repeat this step for each member cluster.

Step 2: (Optional) Create and distribute a namespace to member clusters

ACK Spark Operator requires a dedicated namespace to exist in each member cluster before installation. Create the namespace on the Fleet instance and distribute it using a ClusterPropagationPolicy.

  1. Connect to the Fleet instance using its kubeconfig file and create the spark namespace:

    kubectl create ns spark
  2. Create a ClusterPropagationPolicy to distribute the namespace to specific member clusters. To distribute to all member clusters, remove the clusterAffinity block.

    apiVersion: policy.one.alibabacloud.com/v1alpha1
    kind: ClusterPropagationPolicy
    metadata:
      name: ns-policy
    spec:
      resourceSelectors:
      - apiVersion: v1
        kind: Namespace
        name: spark
      placement:
        clusterAffinity:
          clusterNames:
          - <cluster1-id>   # Replace with the ID of a member cluster
          - <cluster2-id>   # Replace with the ID of a member cluster
        replicaScheduling:
          replicaSchedulingType: Duplicated

    replicaSchedulingType: Duplicated copies the namespace to every cluster in clusterNames, as opposed to Divided, which splits replica counts across clusters.

Step 3: Install ack-spark-operator in member clusters

Important

Install ack-spark-operator version 2.1.2 or later in each member cluster where you want to run Spark jobs.

  1. Log in to the ACK console. In the left navigation pane, choose Marketplace > Marketplace.

  2. On the Marketplace page, click the App Catalog tab. Find and click ack-spark-operator.

  3. On the ack-spark-operator page, click Deploy.

  4. In the Deploy panel, select the target cluster and namespace, then click Next.

  5. In the Parameters step, select 2.1.2 from the Chart Version drop-down list. In the Parameters editor, add the spark namespace to spark.jobNamespaces, then click OK.

    Important

    Set spark.jobNamespaces to include the namespace where you plan to create SparkApplication resources. If left at the default ["default"], Spark jobs submitted to the spark namespace will not be picked up by the operator.

    Key parameters:

    Parameter Description Default
    controller.replicas Number of controller replicas 1
    webhook.replicas Number of webhook replicas 1
    spark.jobNamespaces Namespaces where Spark jobs can run. Use [""] for all namespaces, or list specific namespaces: ["ns1","ns2"] ["default"]
    spark.serviceAccount.name Name of the ServiceAccount (with RBAC) auto-created in each job namespace spark-operator-spark

    Repeat for each member cluster where you want Spark jobs to run.

Step 4: Create and distribute a PriorityClass

Assign Spark jobs a lower priority than online services so they only use resources that online workloads are not consuming.

  1. Connect to the Fleet instance and create a low-priority PriorityClass with a negative value:

    apiVersion: scheduling.k8s.io/v1
    kind: PriorityClass
    metadata:
      name: low-priority
    value: -1000
    globalDefault: false
    description: "Low priority for Spark applications"
  2. Create a ClusterPropagationPolicy to distribute the PriorityClass to member clusters. To distribute to all member clusters, remove the clusterAffinity block.

    apiVersion: policy.one.alibabacloud.com/v1alpha1
    kind: ClusterPropagationPolicy
    metadata:
      name: priority-policy
    spec:
      preserveResourcesOnDeletion: false
      resourceSelectors:
      - apiVersion: scheduling.k8s.io/v1
        kind: PriorityClass
      placement:
        clusterAffinity:
          clusterNames:
          - <cluster1-id>   # Replace with the ID of a member cluster
          - <cluster2-id>   # Replace with the ID of a member cluster
        replicaScheduling:
          replicaSchedulingType: Duplicated

Step 5: Submit a SparkApplication in a colocation architecture

Submitting a Spark job requires two resources on the Fleet instance: a PropagationPolicy that controls how SparkApplication resources are distributed, and the SparkApplication itself with colocation annotations.

Create a PropagationPolicy

The PropagationPolicy tells the Fleet instance which clusters to target and how to schedule replicas across them.

apiVersion: policy.one.alibabacloud.com/v1alpha1
kind: PropagationPolicy
metadata:
  name: sparkapp-policy
  namespace: spark
spec:
  preserveResourcesOnDeletion: false
  propagateDeps: true                  # Also propagate dependent resources (e.g., ServiceAccount)
  placement:
    clusterAffinity:
      clusterNames:
      - <cluster1-id>                  # Replace with the ID of a member cluster
      - <cluster2-id>                  # Replace with the ID of a member cluster
    replicaScheduling:
      replicaSchedulingType: Divided   # Split the job across clusters based on available capacity
      customSchedulingType: Gang       # All pods (driver + executors) must be scheduled together
  resourceSelectors:
    - apiVersion: sparkoperator.k8s.io/v1beta2
      kind: SparkApplication

Key fields:

Field Value Effect
replicaSchedulingType Divided Distributes replicas across member clusters proportionally, instead of duplicating the full resource to each cluster
customSchedulingType Gang Ensures the driver and all executor pods are scheduled atomically — the job does not start unless all pods can be placed
propagateDeps true Automatically propagates resources that SparkApplication depends on, such as ServiceAccounts

Submit the SparkApplication

Add the sparkoperator.k8s.io/koordinator-colocation: "true" annotation to both the driver and executor pod templates. This tells ACK Spark Operator to schedule those pods using idle (Best Effort) resources via ACK Koordinator colocation.

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: spark
spec:
  arguments:
  - "50000"
  driver:
    coreLimit: 1000m
    cores: 1
    memory: 512m
    priorityClassName: low-priority           # Uses the PriorityClass created in Step 4
    template:
      metadata:
        annotations:
          sparkoperator.k8s.io/koordinator-colocation: "true"   # Schedule on idle resources
      spec:
        containers:
        - name: spark-kubernetes-driver
    serviceAccount: spark-operator-spark
  executor:
    coreLimit: 1000m
    cores: 1
    instances: 1
    memory: 1g
    priorityClassName: low-priority
    template:
      metadata:
        annotations:
          sparkoperator.k8s.io/koordinator-colocation: "true"   # Schedule on idle resources
      spec:
        containers:
        - name: spark-kubernetes-executor
  image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.4
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.4.jar
  mainClass: org.apache.spark.examples.SparkPi
  mode: cluster
  restartPolicy:
    type: Never
  sparkVersion: 3.5.4
  type: Scala

Step 6: Check the status of the Spark job

All verification commands run on the Fleet instance. Use the Fleet instance's kubeconfig file to connect.

  1. Check the overall job status:

    kubectl get sparkapp -nspark

    Expected output:

    NAME       STATUS    ATTEMPTS   START                  FINISH       AGE
    spark-pi   RUNNING   1          2025-03-05T11:19:43Z   <no value>   48s
  2. Identify which member cluster the job was scheduled to:

    kubectl describe sparkapp spark-pi -nspark

    Look for a line similar to:

    Normal   ScheduleBindingSucceed  2m29s   default-scheduler   Binding has been scheduled successfully. Result: {c6xxxxx:0,[{driver 1} {executor 1}]}
  3. Confirm that the SparkApplication was fully propagated to the member cluster:

    kubectl get rb spark-pi-sparkapplication -nspark

    Expected output:

    NAME                        SCHEDULED   FULLYAPPLIED   OVERRIDDEN   ALLAVAILABLE   AGE
    spark-pi-sparkapplication   True        True           True         True
  4. Check the job status in the member cluster:

    kubectl amc get sparkapp -M -nspark

    Expected output:

    NAME       CLUSTER     STATUS      ATTEMPTS   START                  FINISH                 AGE   ADOPTION
    spark-pi   c6xxxxxxx   COMPLETED   1          2025-02-24T12:10:34Z   2025-02-24T12:11:20Z   61s   Y
  5. Check pod status across member clusters:

    kubectl amc get pod -M -nspark

    Expected output:

    NAME                               CLUSTER     READY   STATUS    RESTARTS   AGE
    spark-pi-3c0565956608ad6d-exec-1   c6xxxxxxx   1/1     Running   0          2m35s
    spark-pi-driver                    c6xxxxxxx   1/1     Running   0          2m50s
  6. View the full details of the SparkApplication in a specific member cluster:

    kubectl amc get sparkapp spark-pi -m <member-cluster-id> -oyaml -nspark

What's next

  • Learn about single-cluster colocation to understand how ACK Koordinator manages resource isolation within a cluster.

  • Configure additional PropagationPolicy rules to target clusters by label selectors instead of explicit cluster IDs.