Use idle resources to schedule and distribute Spark jobs in multiple clusters - Container Service for Kubernetes

If you have multiple Container Service for Kubernetes (ACK) clusters that run online services and you want to use the idle resources in the clusters to run Spark jobs without interrupting the online services, you can use the multi-cluster Spark job scheduling and distribution provided by Distributed Cloud Container Platform for Kubernetes (ACK One). This topic describes how to use an ACK One Fleet instance and the ACK Koordinator component to use the idle resources in the clusters associated with the Fleet instance to schedule and distribute a Spark job across multiple clusters. This helps you utilize idle resources in multiple clusters. You can configure job priority and the colocation feature to prevent the online services from being affected by the Spark job.

Background information

The following features are required when you use idle resources to schedule and distribute a Spark job in multiple clusters:

Multi-cluster Spark job scheduling and distribution provided by ACK One Fleet instances, including idle resource-aware scheduling.
Colocation of Koordinator supported by ACK Spark Operator.
Single-cluster colocation of ACK Koordinator.

Procedure:

Associate multiple ACK clusters with an ACK Fleet instance and deploy ACK Koordinator and ACK Spark Operator in each associated cluster.
Create SparkApplication and PropagationPolicy for the Fleet instance.
The multi-cluster scheduling component (Global Scheduler) of the Fleet instance matches Spark job resource requests based on the remaining resources of each associated sub-cluster.
For sub-clusters whose Kubernetes version is 1.28 or later, the Fleet instance supports resource preoccupation to improve the success rate of Spark job scheduling.
After the Fleet instance schedules jobs, SparkApplication is scheduled and distributed to the associated clusters.
In the associated clusters, ACK Spark Operator runs the driver and executor of Spark jobs. At the same time, the Fleet instance watches the running status of the Spark job in sub-clusters. If the driver cannot be run due to insufficient resources, the Fleet instance reclaims the SparkApplication after a specific period of time and reschedules SparkApplication to other associated clusters that have sufficient resources.

Prerequisites

The Fleet instance is associated with multiple clusters whose Kubernetes versions are 1.18 or later. For more information, see Manage associated clusters.
You have granted the AliyunAdcpFullAccess permission to a Resource Access Management (RAM) user. For more information, see Grant permissions to a RAM user.
The AMC command-line tool is installed.
ack-koordinator (FKA ack-slo-manager) is installed in each associated cluster.

Step 1: deploy ack-koordinator in each associated cluster

Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of the one you want to change. In the left-side navigation pane, choose Configurations > ConfigMaps.

On the ConfigMap page, click Create from YAML. Copy the following YAML template to the Template code editor. For more information, see Get started with colocation.

# Example of the ack-slo-config ConfigMap. 
apiVersion: v1
kind: ConfigMap
metadata:
  name: ack-slo-config
  namespace: kube-system
data:
  colocation-config: |-
    {
      "enable": true
    }
  resource-qos-config: |-
    {
      "clusterStrategy": {
        "lsClass": {
          "cpuQOS": {
            "enable": true
          },
          "memoryQOS": {
            "enable": true
          },
          "resctrlQOS": {
            "enable": true
          }
        },
        "beClass": {
          "cpuQOS": {
            "enable": true
          },
          "memoryQOS": {
            "enable": true
          },
          "resctrlQOS": {
            "enable": true
          }
        }
      }
    }
  resource-threshold-config: |-
    {
      "clusterStrategy": {
        "enable": true
      }
    }

Step 2: (Optional) Create a namespace on the Fleet instance and distribute the namespace to the associated clusters

Before you install ack-spark-operator in an associated cluster, make sure that the cluster has a namespace dedicated to the Spark job. If the cluster does not have a namespace dedicated to the Spark job, ack-spark-operator cannot be installed as normal. You can create a namespace on the Fleet instance and then create a ClusterPropagationPolicy to distribute the namespace to each associated cluster. In this example, a namespace named spark is created and distributed to each associated cluster.

Use the kubeconfig file of the Fleet instance to connect to the Fleet instance and run the following command to create a namespace named spark:
```
kubectl create ns spark
```

Create a ClusterPropagationPolicy to distribute the namespace to the associated clusters that match specific rules. If you want to distribute the namespace to all associated clusters, leave the clusterAffinity parameter empty.

apiVersion: policy.one.alibabacloud.com/v1alpha1
kind: ClusterPropagationPolicy
metadata:
  name: ns-policy
spec:
  resourceSelectors:
  - apiVersion: v1
    kind: Namespace
    name: spark
  placement:
    clusterAffinity:
      clusterNames:
      - ${cluster1-id} # The ID of an associated cluster. 
      - ${cluster2-id} # The ID of an associated cluster. 
    replicaScheduling:
      replicaSchedulingType: Duplicated

Step 3: Install ack-spark-operator in the associated clusters

Install ack-spark-operator 2.1.2 or later in the associated cluster in which you want to run the Spark job.

Log on to the ACK console. In the left-side navigation pane, choose Marketplace > Marketplace.
On the Marketplace page, click the App Catalog tab. Find and click ack-spark-operator.
On the ack-spark-operator page, click Deploy.
In the Deploy panel, select a cluster and namespace, and then click Next.

In the Parameters step, select 2.1.2 from the Chart Version drop-down list, add the spark namespace to the jobNamespaces parameter in the Parameters code editor, and then click OK.

Important

You must specify the namespace of the SparkApplication you want to create in the spark.jobNamespaces parameter.

The following table describes some parameters. You can find the parameter configurations in the Parameters section on the ack-spark-operator page.

Parameter	Description	Example
`controller.replicas`	The number of controller replicas.	Default value: 1.
`webhook.replicas`	The number of webhook replicas.	Default value: 1.
`spark.jobNamespaces`	The namespaces that can run Spark jobs. If this parameter is left empty, Spark jobs can be run in all namespaces. Separate multiple namespaces with commas (,).	Default value: `["default"]`. `[""]`: All namespaces. `["ns1","ns2","ns3"]`: Specify one or more namespaces.
`spark.serviceAccount.name`	A Spark job automatically creates a ServiceAccount named `spark-operator-spark` and the corresponding role-based access control (RBAC) resources in each namespace specified by `spark.jobNamespaces`. You can specify a custom name for the ServiceAccount and then specify the custom name when you submit a Spark job.	Default value: `spark-operator-spark`.

Step 4: Create a PriorityClass on the Fleet instance and distribute the PriorityClass to the associated clusters

To ensure that the submitted Spark job does not occupy the resources used by the online service or affect the online service, we recommend that you assign the Spark job a lower priority than the online service.

Use the kubeconfig file of the Fleet instance to create a low-priority PriorityClass and set the value to negative.

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: low-priority
value: -1000
globalDefault: false
description: "Low priority for Spark applications"

Creates a ClusterPropagationPolicy on the Fleet instance to distribute the PriorityClass to the specified cluster. If you want to distribute PriorityClass to all associated clusters, you can delete the clusterAffinity parameter.

apiVersion: policy.one.alibabacloud.com/v1alpha1
kind: ClusterPropagationPolicy
metadata:
  name: priority-policy
spec:
  preserveResourcesOnDeletion: false
  resourceSelectors:
  - apiVersion: scheduling.k8s.io/v1
    kind: PriorityClass
  placement:
    clusterAffinity:
      clusterNames:
      - ${cluster1-id} # The ID of your cluster. 
      - ${cluster2-id} # The ID of your cluster. 
#      labelSelector:
#        matchLabels:
#          key: value
    replicaScheduling:
      replicaSchedulingType: Duplicated

Step 5: Submit a SparkApplication in a colocation architecture on the Fleet instance

Create a PropagationPolicy by using the following YAML template. The PropagationPolicy is used to distribute all SparkApplications that use the sparkoperator.k8s.io/v1beta2 API version to the associated clusters that match specific rules. If you want to distribute the SparkApplications to all associated clusters, leave the clusterAffinity parameter empty.

apiVersion: policy.one.alibabacloud.com/v1alpha1
kind: PropagationPolicy
metadata:
  name: sparkapp-policy 
  namespace: spark
spec:
  preserveResourcesOnDeletion: false
  propagateDeps: true
  placement:
    clusterAffinity:
      clusterNames:
      - ${cluster1-id} # The ID of an associated cluster. 
      - ${cluster2-id} # The ID of an associated cluster. 
#      labelSelector:
#        matchLabels:
#          key: value
    replicaScheduling:
      replicaSchedulingType: Divided
      customSchedulingType: Gang
  resourceSelectors:
    - apiVersion: sparkoperator.k8s.io/v1beta2
      kind: SparkApplication

Create a Spark job on the Fleet instance. Add the sparkoperator.k8s.io/koordinator-colocation: "true" annotation to the SparkApplication to use idle resources to schedule the driver pod and the executor pod of the SparkApplication. The following SparkApplication template uses idle resources to schedule the driver pod and the executor pod.

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: spark
spec:
  arguments:
  - "50000"
  driver:
    coreLimit: 1000m
    cores: 1
    memory: 512m
    priorityClassName: low-priority
    template:
      metadata:
        annotations:
          sparkoperator.k8s.io/koordinator-colocation: "true"
      spec:
        containers:
        - name: spark-kubernetes-driver
        serviceAccount: spark-operator-spark
  executor:
    coreLimit: 1000m
    cores: 1
    instances: 1
    memory: 1g
    priorityClassName: low-priority
    template:
      metadata:
        annotations:
          sparkoperator.k8s.io/koordinator-colocation: "true"
      spec:
        containers:
        - name: spark-kubernetes-executor
  image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.4
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.4.jar
  mainClass: org.apache.spark.examples.SparkPi
  mode: cluster
  restartPolicy:
    type: Never
  sparkVersion: 3.5.4
  type: Scala

Step 6: Check the status of the Spark job

Run the following command on the Fleet instance to view the status of the Spark job:

kubectl get sparkapp -nspark

Expected output:

NAME       STATUS    ATTEMPTS   START                  FINISH       AGE
spark-pi   RUNNING   1          2025-03-05T11:19:43Z   <no value>   48s

Run the following command on the Fleet instance to query the associated cluster to which the Spark job is scheduled:

kubectl describe sparkapp spark-pi  -nspark

Expected output:

Normal   ScheduleBindingSucceed  2m29s                  default-scheduler                   Binding has been scheduled successfully. Result: {c6xxxxx:0,[{driver 1} {executor 1}]}

Run the following command on the Fleet instance to query the status of resource distribution:

kubectl get rb  spark-pi-sparkapplication -nspark

Expected output:

NAME                        SCHEDULED   FULLYAPPLIED   OVERRIDDEN   ALLAVAILABLE   AGE
spark-pi-sparkapplication   True        True           True         True

Run the following command on the Fleet instance to check the status of the Spark job in the associated cluster:

kubectl amc get sparkapp -M -nspark

Expected output:

NAME       CLUSTER     STATUS      ATTEMPTS   START                  FINISH                 AGE   ADOPTION
spark-pi   c6xxxxxxx   COMPLETED   1          2025-02-24T12:10:34Z   2025-02-24T12:11:20Z   61s   Y

Run the following command on the Fleet instance to query the status of the pods:

kubectl amc get pod -M -nspark

Expected output:

NAME                               CLUSTER     READY   STATUS      RESTARTS   AGE
spark-pi-3c0565956608ad6d-exec-1   c6xxxxxxx   1/1     Running            0          2m35s
spark-pi-driver                    c6xxxxxxx   1/1     Running            0          2m50s

Run the following command on the Fleet instance to view the details of the Spark job in the associated cluster:
```
kubectl amc get sparkapp spark-pi -m ${member clusterid} -oyaml -nspark   
```