All Products
Search
Document Center

Container Compute Service:Use BestEffort pods to run Spark applications

Last Updated:Dec 10, 2024

Alibaba Cloud Container Compute Service (ACS) provides the serverless computing feature. For big data computing jobs, you can use pods whose computing power quality of service (QoS) class is BestEffort to meet the elastic computing requirements and reduce the computing costs of the jobs. This topic describes how to run Spark applications by using the BestEffort pods provided by ACS.

Background information

Apache Spark and Spark Operator

Apache Spark provides powerful capabilities in data science and machine learning scenarios, which can be used to handle various complex data processing and analysis tasks. Spark provides efficient solutions for offline batch processing and real-time stream processing. Spark Operator can be operated in Kubernetes and use custom resources. It allows you to create Spark applications by using YAML files, which simplifies the process and increases efficiency in cloud-native environments.

BestEffort pods

You can use Spark Operator to manage and schedule Spark applications in Kubernetes. This significantly improves the efficiency of data processing and analysis. ACS supports the creation of pods whose computing power QoS class is BestEffort (BestEffort pods). BestEffort pods provide an economical and efficient solution for short-running jobs and stateless applications that have high scalability and fault tolerance. This helps reduce computing costs and ensures the efficient execution of the job.

Note

This topic describes how to run a Spark application by using the BestEffort pods provided by ACS. If you want to use Apache Spark and Spark Operator in a production environment, we recommend that you configure them based on the official recommendations from Spark.

Prerequisites

Procedure

Step 1: Create the Spark application and configure related parameters

  1. Create a file named spark-sa.yaml and copy the following content to the file. Grant the ServiceAccount named spark the permissions to modify cluster resources.

    apiVersion: v1
    kind: Namespace
    metadata:
      name: spark-demo
    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      namespace: spark-demo
      name: spark
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: spark-role-binding
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: edit
    subjects:
      - kind: ServiceAccount
        name: spark
        namespace: spark-demo
  2. Run the following command to configure the Spark application settings.

    kubectl apply -f spark-sa.yaml

Step 2: Use a BestEffort pod to run a Spark application

  1. Create a file named spark-pi.yaml and copy the following content to the file. alibabacloud.com/compute-qos: best-effort is specified in the .spec.executor.labels parameter. This indicates that the executor of the computing job in the Spark application uses BestEffort pods.

    apiVersion: "sparkoperator.k8s.io/v1beta2"
    kind: SparkApplication
    metadata:
      namespace: spark-demo
      name: spark-pi
    spec:
      type: Scala
      mode: cluster
      image: "registry.cn-hangzhou.aliyuncs.com/koordinator-sh/spark-test:v3.4.1-0.1"
      imagePullPolicy: IfNotPresent
      mainClass: org.apache.spark.examples.SparkPi
      mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.4.1.jar"
      sparkVersion: "3.4.1"
      restartPolicy:
        type: Never
      driver:
        cores: 1
        coreLimit: "1"
        memory: "512m"
        labels:
          version: 3.4.1
        serviceAccount: spark
      executor:
        cores: 1
        coreLimit: "1"
        instances: 1
        memory: "512m"
        deleteOnTermination: false  
        labels:
          version: 3.4.1
          alibabacloud.com/compute-qos: best-effort     # best-effort indicates that the executor uses BestEffort pods.
  2. Run the following command to deploy a Spark application:

    kubectl apply -f spark-pi.yaml

Step 3: Check the status of the Spark application

  1. Run the following command to query the name of the pod that runs the Spark application:

    kubectl get pod -n spark-demo -o wide

    Expected output:

    NAME                               READY   STATUS    RESTARTS   AGE   IP              NODE                           NOMINATED NODE   READINESS GATES
    spark-pi-xxxxx591db3xxxxx-exec-1   1/1     Running   0          15s   192.168.x.xxx   virtual-kubelet-cn-xxxxxxx-x   <none>           <none>
    spark-pi-driver                    1/1     Running   0          39s   192.168.x.xxx   virtual-kubelet-cn-xxxxxxx-x   <none>           <none>

    The output shows that the pod is in the Running state. This indicates that the Spark application runs as expected in the ACS cluster.

Verify the result

  1. After the status of the Spark application changes to Completed, run the following command to view the running result of the Spark application:

    kubectl logs -n spark-demo spark-pi-driver

    Expected output:

    ......
    24/09/10 07:21:30 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 1.346414 s
    Pi is roughly 3.1402757013785068
    ......

    The output shows that the time consumed to execute the Spark computing job is 1.346414s, and the value of Pi is 3.1402757013785068.

    Important

    The execution time and Pi value provided in this step are reference values. The actual data is subject to your operating environment.

  2. Run the following command to check the computing power QoS class of the pod by using the alibabacloud.com/compute-qos label:

    kubectl get pod -n spark-demo -Lalibabacloud.com/compute-qos -o wide

    Expected output:

    NAME                               READY   STATUS      RESTARTS   AGE     IP              NODE                           NOMINATED NODE   READINESS GATES   COMPUTE-QOS
    spark-pi-xxxxx591db3xxxxx-exec-1   0/1     Completed   0          6m35s   192.168.x.xxx   virtual-kubelet-cn-xxxxxxx-x   <none>           <none>            best-effort
    spark-pi-driver                    0/1     Completed   0          8m11s   192.168.x.xxx   virtual-kubelet-cn-xxxxxxx-x   <none>           <none>            default

    The value in the COMPUTE-QOS column on the right of the output indicates that the computing power QoS class of the pod that executes the Spark job is best-effort.