Apache Spark is a computing engine for large-scale data processing. Apache Spark is widely used to analyze workloads in big data computing and machine learning scenarios. This topic describes how to use the Fleet instances of Distributed Cloud Container Platform for Kubernetes (ACK One) to schedule and distribute Spark jobs in multiple clusters. This helps improve the utilization of idle resources in multiple clusters.
How it works
Install the ack-spark-operator component for the associated cluster.
Create
SparkApplicationandPropagationPolicyfor the Fleet instance.The multi-cluster scheduling component (Global Scheduler) of the Fleet instance matches Spark job resource requests based on the remaining resources of each associated sub-cluster.
For sub-clusters whose Kubernetes version is 1.28 or later, the Fleet instance supports resource preoccupation to improve the success rate of Spark job scheduling.
After the Fleet instance schedules jobs,
SparkApplicationis scheduled and distributed to the associated clusters.In the associated clusters, ACK Spark Operator runs the
driverandexecutorof Spark jobs. At the same time, the Fleet instance watches the running status of the Spark job in sub-clusters. If thedrivercannot be run due to insufficient resources, the Fleet instance reclaims theSparkApplicationafter a specific period of time and reschedules SparkApplication to other associated clusters that have sufficient resources.
Prerequisites
The Fleet instance is associated with multiple clusters whose Kubernetes versions are 1.18 or later. For more information, see Manage associated clusters.
The Resource Access Management (RAM) policy AliyunAdcpFullAccess is attached to a RAM user. For more information, see Grant permissions to a RAM user.
The AMC command-line tool is installed. For more information, see Use AMC command line.
Step 1: Install the ack-spark-operator component in the associated clusters
Install the ack-spark-operator components in the sub-cluster in which you want to run Spark jobs.
Log on to the ACK console. In the left-side navigation pane, choose .
On the Marketplace page, click the App Catalog tab. Find and click ack-spark-operator.
On the ack-spark-operator page, click Deploy.
In the Deploy panel, select a cluster and namespace, and then click Next.
In the Parameters step, configure the parameters and click OK.
The following table describes some parameters. You can find the parameter configurations in the Parameters section on the ack-spark-operator page.
Parameter
Description
Example
controller.replicasThe number of controller replicas.
Default value: 1.
webhook.replicasThe number of webhook replicas.
Default value: 1.
spark.jobNamespacesThe namespaces that can run Spark jobs. If this parameter is left empty, Spark jobs can be run in all namespaces. Separate multiple namespaces with commas (,).
Default value:
["default"].[""]: All namespaces.["ns1","ns2","ns3"]: Specify one or more namespaces.
spark.serviceAccount.nameA Spark job automatically creates a ServiceAccount named
spark-operator-sparkand the corresponding role-based access control (RBAC) resources in each namespace specified byspark.jobNamespaces. You can specify a custom name for the ServiceAccount and then specify the custom name when you submit a Spark job.Default value:
spark-operator-spark.
Step 2: Create a PriorityClass on the Fleet instance and distribute the PriorityClass to sub-clusters
To ensure that the submitted Spark jobs do not occupy online service resources and affect the normal operation of online service services, we recommend that you specify a low priority for the submitted Spark jobs.
Use the kubeconfig file of the Fleet instance to create a low-priority
PriorityClassand set thevalueto negative.apiVersion: scheduling.k8s.io/v1 kind: PriorityClass metadata: name: low-priority value: -1000 globalDefault: false description: "Low priority for Spark applications"Creates a
ClusterPropagationPolicyon the Fleet instance to distribute thePriorityClassto the specified cluster. If you want to distributePriorityClassto all associated clusters, you can delete theclusterAffinityparameter.apiVersion: policy.one.alibabacloud.com/v1alpha1 kind: ClusterPropagationPolicy metadata: name: priority-policy spec: preserveResourcesOnDeletion: false resourceSelectors: - apiVersion: scheduling.k8s.io/v1 kind: PriorityClass placement: clusterAffinity: clusterNames: - ${cluster1-id} # The ID of your cluster. - ${cluster2-id} # The ID of your cluster. # labelSelector: # matchLabels: # key: value replicaScheduling: replicaSchedulingType: Duplicated
Step 3: Submit a Spark job on the Fleet instance and schedule the Spark job to sub-clusters
(Optional) Create namespaces on the Fleet instance and distribute the namespaces to sub-clusters.
If the namespace in which the application to be distributed is located does not exist on the Fleet instance, you must create the namespace on the Fleet instance first, and make sure that the namespace is included in the
spark.jobNamespacesparameter of the component that is installed in Step 1. If the namespace already exists, you can skip this step.Run the following command to create a namespace by using the kubeconfig file of the Fleet instance:
kubectl create ns xxxThe corresponding namespace is also required for sub-clusters. If the namespace does not exist, you can use
ClusterPropagationPolicyto distribute the namespaces on the Fleet instance to each sub-cluster.apiVersion: policy.one.alibabacloud.com/v1alpha1 kind: ClusterPropagationPolicy metadata: name: ns-policy spec: resourceSelectors: - apiVersion: v1 kind: Namespace name: xxx placement: clusterAffinity: clusterNames: - ${cluster1-id} # The ID of your cluster. - ${cluster2-id} # The ID of your cluster. replicaScheduling: replicaSchedulingType: Duplicated
Create the following
PropagationPolicyon the Fleet instance to distribute allSparkApplicationresources ofsparkoperator. Kubernetes. io/v1beta2to the corresponding clusters:apiVersion: policy.one.alibabacloud.com/v1alpha1 kind: PropagationPolicy metadata: name: sparkapp-policy namespace: default spec: preserveResourcesOnDeletion: false propagateDeps: true placement: clusterAffinity: clusterNames: - ${cluster1-id} # The ID of your cluster. - ${cluster2-id} # The ID of your cluster. # labelSelector: # matchLabels: # key: value replicaScheduling: replicaSchedulingType: Divided customSchedulingType: Gang resourceSelectors: - apiVersion: sparkoperator.k8s.io/v1beta2 kind: SparkApplicationCreate
SparkApplicationon the Fleet instance and configure thepriorityClassNameparameter in thedriverandexecutor. After creation, the application is distributed to the cluster selected by thePropagationPolicyin Step 2.apiVersion: sparkoperator.k8s.io/v1beta2 kind: SparkApplication metadata: name: spark-pi namespace: default # Make sure that the namespace is in the namespace list specified by the spark.jobNamespaces parameter. spec: type: Scala mode: cluster image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.4 imagePullPolicy: IfNotPresent mainClass: org.apache.spark.examples.SparkPi mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.4.jar arguments: - "1000" sparkVersion: 3.5.4 driver: cores: 1 memory: 512m priorityClassName: low-priority serviceAccount: spark-operator-spark # Replace spark-operator-spark with the custom name that you specified. executor: instances: 1 cores: 1 memory: 512m priorityClassName: low-priority restartPolicy: type: Never
Step 4: View the Spark job
Run the following command on the Fleet instance to view the status of the Spark job:
kubectl get sparkappExpected output:
NAME STATUS ATTEMPTS START FINISH AGE spark-pi RUNNING 1 2025-02-24T12:10:34Z <no value> 11sRun the following command on the Fleet instance to check which associated cluster the Spark job is scheduled to:
kubectl describe sparkapp spark-piExpected output:
Normal ScheduleBindingSucceed 2m29s default-scheduler Binding has been scheduled successfully. Result: {c6xxxxx:0,[{driver 1} {executor 1}]}Run the following command on the Fleet instance to check the status of the Spark job in the associated cluster:
kubectl amc get sparkapp -MExpected output:
NAME CLUSTER STATUS ATTEMPTS START FINISH AGE ADOPTION spark-pi c6xxxxxxx COMPLETED 1 2025-02-24T12:10:34Z 2025-02-24T12:11:20Z 61s YRun the following command on the Fleet instance to query the status of the pod:
kubectl amc get pod -MExpected output:
NAME CLUSTER READY STATUS RESTARTS AGE spark-pi-driver c6xxxxxxx 0/1 Completed 0 68sRun the following command on the Fleet instance to view the details of the Spark job in the associated cluster:
kubectl amc get sparkapp spark-pi -m ${member clusterid} -oyaml