Multi-cluster scheduling and distribution of Spark jobs - Container Service for Kubernetes

Apache Spark is a computing engine for large-scale data processing. Apache Spark is widely used to analyze workloads in big data computing and machine learning scenarios. This topic describes how to use the Fleet instances of Distributed Cloud Container Platform for Kubernetes (ACK One) to schedule and distribute Spark jobs in multiple clusters. This helps improve the utilization of idle resources in multiple clusters.

How it works

Install the ack-spark-operator component for the associated cluster.
Create SparkApplication and PropagationPolicy for the Fleet instance.
The multi-cluster scheduling component (Global Scheduler) of the Fleet instance matches Spark job resource requests based on the remaining resources of each associated sub-cluster.
For sub-clusters whose Kubernetes version is 1.28 or later, the Fleet instance supports resource preoccupation to improve the success rate of Spark job scheduling.
After the Fleet instance schedules jobs, SparkApplication is scheduled and distributed to the associated clusters.
In the associated clusters, ACK Spark Operator runs the driver and executor of Spark jobs. At the same time, the Fleet instance watches the running status of the Spark job in sub-clusters. If the driver cannot be run due to insufficient resources, the Fleet instance reclaims the SparkApplication after a specific period of time and reschedules SparkApplication to other associated clusters that have sufficient resources.

Prerequisites

The Fleet instance is associated with multiple clusters whose Kubernetes versions are 1.18 or later. For more information, see Manage associated clusters.
The Resource Access Management (RAM) policy AliyunAdcpFullAccess is attached to a RAM user. For more information, see Grant permissions to a RAM user.
The AMC command-line tool is installed. For more information, see Use AMC command line.

Step 1: Install the ack-spark-operator component in the associated clusters

Install the ack-spark-operator components in the sub-cluster in which you want to run Spark jobs.

Log on to the ACK console. In the left-side navigation pane, choose Marketplace > Marketplace.
On the Marketplace page, click the App Catalog tab. Find and click ack-spark-operator.
On the ack-spark-operator page, click Deploy.
In the Deploy panel, select a cluster and namespace, and then click Next.

In the Parameters step, configure the parameters and click OK.

The following table describes some parameters. You can find the parameter configurations in the Parameters section on the ack-spark-operator page.

Parameter	Description	Example
`controller.replicas`	The number of controller replicas.	Default value: 1.
`webhook.replicas`	The number of webhook replicas.	Default value: 1.
`spark.jobNamespaces`	The namespaces that can run Spark jobs. If this parameter is left empty, Spark jobs can be run in all namespaces. Separate multiple namespaces with commas (,).	Default value: `["default"]`. `[""]`: All namespaces. `["ns1","ns2","ns3"]`: Specify one or more namespaces.
`spark.serviceAccount.name`	A Spark job automatically creates a ServiceAccount named `spark-operator-spark` and the corresponding role-based access control (RBAC) resources in each namespace specified by `spark.jobNamespaces`. You can specify a custom name for the ServiceAccount and then specify the custom name when you submit a Spark job.	Default value: `spark-operator-spark`.

Step 2: Create a PriorityClass on the Fleet instance and distribute the PriorityClass to sub-clusters

To ensure that the submitted Spark jobs do not occupy online service resources and affect the normal operation of online service services, we recommend that you specify a low priority for the submitted Spark jobs.

Use the kubeconfig file of the Fleet instance to create a low-priority PriorityClass and set the value to negative.

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: low-priority
value: -1000
globalDefault: false
description: "Low priority for Spark applications"

Creates a ClusterPropagationPolicy on the Fleet instance to distribute the PriorityClass to the specified cluster. If you want to distribute PriorityClass to all associated clusters, you can delete the clusterAffinity parameter.

apiVersion: policy.one.alibabacloud.com/v1alpha1
kind: ClusterPropagationPolicy
metadata:
  name: priority-policy
spec:
  preserveResourcesOnDeletion: false
  resourceSelectors:
  - apiVersion: scheduling.k8s.io/v1
    kind: PriorityClass
  placement:
    clusterAffinity:
      clusterNames:
      - ${cluster1-id} # The ID of your cluster. 
      - ${cluster2-id} # The ID of your cluster. 
#      labelSelector:
#        matchLabels:
#          key: value
    replicaScheduling:
      replicaSchedulingType: Duplicated

Step 3: Submit a Spark job on the Fleet instance and schedule the Spark job to sub-clusters

(Optional) Create namespaces on the Fleet instance and distribute the namespaces to sub-clusters.
1. If the namespace in which the application to be distributed is located does not exist on the Fleet instance, you must create the namespace on the Fleet instance first, and make sure that the namespace is included in the spark.jobNamespaces parameter of the component that is installed in Step 1. If the namespace already exists, you can skip this step.
  Run the following command to create a namespace by using the kubeconfig file of the Fleet instance:
```
kubectl create ns xxx
```
2. The corresponding namespace is also required for sub-clusters. If the namespace does not exist, you can use ClusterPropagationPolicy to distribute the namespaces on the Fleet instance to each sub-cluster.
```
apiVersion: policy.one.alibabacloud.com/v1alpha1
kind: ClusterPropagationPolicy
metadata:
  name: ns-policy
spec:
  resourceSelectors:
  - apiVersion: v1
    kind: Namespace
    name: xxx
  placement:
    clusterAffinity:
      clusterNames:
      - ${cluster1-id} # The ID of your cluster. 
      - ${cluster2-id} # The ID of your cluster. 
    replicaScheduling:
      replicaSchedulingType: Duplicated
```

Create the following PropagationPolicy on the Fleet instance to distribute all SparkApplication resources of sparkoperator. Kubernetes. io/v1beta2 to the corresponding clusters:

apiVersion: policy.one.alibabacloud.com/v1alpha1
kind: PropagationPolicy
metadata:
  name: sparkapp-policy 
  namespace: default
spec:
  preserveResourcesOnDeletion: false
  propagateDeps: true
  placement:
    clusterAffinity:
      clusterNames:
      - ${cluster1-id} # The ID of your cluster. 
      - ${cluster2-id} # The ID of your cluster. 
#      labelSelector:
#        matchLabels:
#          key: value
    replicaScheduling:
      replicaSchedulingType: Divided
      customSchedulingType: Gang
  resourceSelectors:
    - apiVersion: sparkoperator.k8s.io/v1beta2
      kind: SparkApplication

Create SparkApplication on the Fleet instance and configure the priorityClassName parameter in the driver and executor. After creation, the application is distributed to the cluster selected by the PropagationPolicy in Step 2.

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: default     # Make sure that the namespace is in the namespace list specified by the spark.jobNamespaces parameter. 
spec:
  type: Scala
  mode: cluster
  image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.4
  imagePullPolicy: IfNotPresent
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.4.jar
  arguments:
  - "1000"
  sparkVersion: 3.5.4
  driver:
    cores: 1
    memory: 512m
    priorityClassName: low-priority
    serviceAccount: spark-operator-spark   # Replace spark-operator-spark with the custom name that you specified. 
  executor:
    instances: 1
    cores: 1
    memory: 512m
    priorityClassName: low-priority
  restartPolicy:
    type: Never

Step 4: View the Spark job

Run the following command on the Fleet instance to view the status of the Spark job:

kubectl get sparkapp

Expected output:

NAME       STATUS    ATTEMPTS   START                  FINISH       AGE
spark-pi   RUNNING   1          2025-02-24T12:10:34Z   <no value>   11s

Run the following command on the Fleet instance to check which associated cluster the Spark job is scheduled to:

kubectl describe sparkapp spark-pi

Expected output:

 Normal   ScheduleBindingSucceed  2m29s                  default-scheduler                   Binding has been scheduled successfully. Result: {c6xxxxx:0,[{driver 1} {executor 1}]}

Run the following command on the Fleet instance to check the status of the Spark job in the associated cluster:

kubectl amc get sparkapp -M

Expected output:

NAME       CLUSTER     STATUS      ATTEMPTS   START                  FINISH                 AGE   ADOPTION
spark-pi   c6xxxxxxx   COMPLETED   1          2025-02-24T12:10:34Z   2025-02-24T12:11:20Z   61s   Y

Run the following command on the Fleet instance to query the status of the pod:

kubectl amc get pod -M

Expected output:

NAME              CLUSTER     READY   STATUS      RESTARTS   AGE
spark-pi-driver   c6xxxxxxx   0/1     Completed   0          68s

Run the following command on the Fleet instance to view the details of the Spark job in the associated cluster:
```
kubectl amc get sparkapp spark-pi -m ${member clusterid} -oyaml
```