Alibaba Cloud Container Compute Service (ACS) provides the serverless computing feature. For big data computing jobs, you can use pods whose computing power quality of service (QoS) class is BestEffort to meet the elastic computing requirements and reduce the computing costs of the jobs. This topic describes how to run Spark applications by using the BestEffort pods provided by ACS.
Background information
Apache Spark and Spark Operator
Apache Spark provides powerful capabilities in data science and machine learning scenarios, which can be used to handle various complex data processing and analysis tasks. Spark provides efficient solutions for offline batch processing and real-time stream processing. Spark Operator can be operated in Kubernetes and use custom resources. It allows you to create Spark applications by using YAML files, which simplifies the process and increases efficiency in cloud-native environments.
BestEffort pods
You can use Spark Operator to manage and schedule Spark applications in Kubernetes. This significantly improves the efficiency of data processing and analysis. ACS supports the creation of pods whose computing power QoS class is BestEffort (BestEffort pods). BestEffort pods provide an economical and efficient solution for short-running jobs and stateless applications that have high scalability and fault tolerance. This helps reduce computing costs and ensures the efficient execution of the job.
This topic describes how to run a Spark application by using the BestEffort pods provided by ACS. If you want to use Apache Spark and Spark Operator in a production environment, we recommend that you configure them based on the official recommendations from Spark.
Prerequisites
ACS is activated. For more information, see Step 1: Activate ACS.
An ACS cluster is created and CoreDNS is installed. For more information, see Create an ACS cluster.
The ack-spark-operator 3.0 component is installed by using Helm. For more information, see Use Helm to manage applications in ACS.
A kubectl client is connected to the ACS cluster. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster and Manage Kubernetes clusters with kubectl on CloudShell.
Procedure
Step 1: Create the Spark application and configure related parameters
Create a file named
spark-sa.yamland copy the following content to the file. Grant theServiceAccountnamedsparkthe permissions to modify cluster resources.apiVersion: v1 kind: Namespace metadata: name: spark-demo --- apiVersion: v1 kind: ServiceAccount metadata: namespace: spark-demo name: spark --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: spark-role-binding roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: edit subjects: - kind: ServiceAccount name: spark namespace: spark-demoRun the following command to configure the Spark application settings.
kubectl apply -f spark-sa.yaml
Step 2: Use a BestEffort pod to run a Spark application
Create a file named
spark-pi.yamland copy the following content to the file.alibabacloud.com/compute-qos: best-effortis specified in the.spec.executor.labelsparameter. This indicates that the executor of the computing job in the Spark application uses BestEffort pods.apiVersion: "sparkoperator.k8s.io/v1beta2" kind: SparkApplication metadata: namespace: spark-demo name: spark-pi spec: type: Scala mode: cluster image: "registry.cn-hangzhou.aliyuncs.com/koordinator-sh/spark-test:v3.4.1-0.1" imagePullPolicy: IfNotPresent mainClass: org.apache.spark.examples.SparkPi mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.4.1.jar" sparkVersion: "3.4.1" restartPolicy: type: Never driver: cores: 1 coreLimit: "1" memory: "512m" labels: version: 3.4.1 serviceAccount: spark executor: cores: 1 coreLimit: "1" instances: 1 memory: "512m" deleteOnTermination: false labels: version: 3.4.1 alibabacloud.com/compute-qos: best-effort # best-effort indicates that the executor uses BestEffort pods.Run the following command to deploy a Spark application:
kubectl apply -f spark-pi.yaml
Step 3: Check the status of the Spark application
Run the following command to query the name of the pod that runs the Spark application:
kubectl get pod -n spark-demo -o wideExpected output:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES spark-pi-xxxxx591db3xxxxx-exec-1 1/1 Running 0 15s 192.168.x.xxx virtual-kubelet-cn-xxxxxxx-x <none> <none> spark-pi-driver 1/1 Running 0 39s 192.168.x.xxx virtual-kubelet-cn-xxxxxxx-x <none> <none>The output shows that the pod is in the
Runningstate. This indicates that the Spark application runs as expected in the ACS cluster.
Verify the result
After the status of the Spark application changes to Completed, run the following command to view the running result of the Spark application:
kubectl logs -n spark-demo spark-pi-driverExpected output:
...... 24/09/10 07:21:30 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 1.346414 s Pi is roughly 3.1402757013785068 ......The output shows that the time consumed to execute the Spark computing job is
1.346414s, and the value of Pi is3.1402757013785068.ImportantThe execution time and Pi value provided in this step are reference values. The actual data is subject to your operating environment.
Run the following command to check the computing power QoS class of the pod by using the
alibabacloud.com/compute-qoslabel:kubectl get pod -n spark-demo -Lalibabacloud.com/compute-qos -o wideExpected output:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES COMPUTE-QOS spark-pi-xxxxx591db3xxxxx-exec-1 0/1 Completed 0 6m35s 192.168.x.xxx virtual-kubelet-cn-xxxxxxx-x <none> <none> best-effort spark-pi-driver 0/1 Completed 0 8m11s 192.168.x.xxx virtual-kubelet-cn-xxxxxxx-x <none> <none> defaultThe value in the
COMPUTE-QOScolumn on the right of the output indicates that the computing power QoS class of the pod that executes the Spark job isbest-effort.