In a serverless Kubernetes (ASK) cluster, you can create pods to meet your business requirements. The system stops billing a pod after the pod lifecycle is terminated. You do not need to reserve computing resources for Spark tasks. This resolves the issues of insufficient computing resources and saves you the need to expand the cluster. In addition, you can reduce the computing costs by using preemptible instances. This topic describes how to use ASK to create Spark tasks to meet your business requirements.

Prerequisites

Procedure

  1. Deploy the ack-spark-operator chart by using one of the following methods:
    • Log on to the Container Service for Kubernetes (ACK) console. In the left-side navigation pane, choose Marketplace > App Catalog and select ack-spark-operator to deploy the chart.
    • Run the helm command to manually deploy the chart.
      Note The Helm version must be V3 or later.
      # Create a service account. 
      kubectl create serviceaccount spark
      # Grant permissions. 
      kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
      # Install the operator. 
      helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator
      helm install incubator/sparkoperator --namespace default  --set operatorImageName=registry.cn-hangzhou.aliyuncs.com/acs/spark-operator  --set operatorVersion=ack-2.4.5-latest  --generate-name
    After you deploy the chart, run the following command to check whether spark-operator is started:
    kubectl -n spark-operator get pod

    Expected output:

    NAME                                  READY   STATUS      RESTARTS   AGE
    ack-spark-operator-7698586d7b-pvwln   1/1     Running     0          5m9s
    ack-spark-operator-init-26tvh         0/1     Completed   0          5m9s
  2. Create a file named spark-pi.yaml and copy the following content into the file:
    apiVersion: "sparkoperator.k8s.io/v1beta2"
    kind: SparkApplication
    metadata:
      name: spark-pi
      namespace: default
    spec:
      arguments:
      - "1000"
      sparkConf:
        "spark.scheduler.maxRegisteredResourcesWaitingTime": "3000s"
        "spark.kubernetes.allocation.batch.size": "1"
        "spark.rpc.askTimeout": "36000s"
        "spark.network.timeout": "36000s"
        "spark.rpc.lookupTimeout": "36000s"
        "spark.core.connection.ack.wait.timeout": "36000s"
        "spark.executor.heartbeatInterval": "10000s"
      type: Scala
      mode: cluster
      image: "registry.cn-shenzhen.aliyuncs.com/ringtail/spark-pi:0.4"
      imagePullPolicy: Always
      mainClass: org.apache.spark.examples.SparkPi
      mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar"
      sparkVersion: "2.4.5"
      restartPolicy:
        type: Never
      args:
      driver:
        cores: 4
        coreLimit: "4"
        annotations:
          k8s.aliyun.com/eci-image-cache: "true"
        memory: "6g"
        memoryOverhead: "2g"
        labels:
          version: 2.4.5
        serviceAccount: spark
      executor:
        annotations:
          k8s.aliyun.com/eci-image-cache: "true"
        cores: 2
        instances: 1
        memory: "3g"
        memoryOverhead: "1g"
        labels:
          version: 2.4.5
  3. Deploy a Spark task.
    1. Run the following command to deploy a Spark task:
      kubectl apply -f spark-pi.yaml

      Expected output:

      sparkapplication.sparkoperator.k8s.io/spark-pi created
    2. Run the following command to view the deployment status of the Spark task:
      kubectl get pod

      Expected output:

      NAME              READY   STATUS    RESTARTS   AGE
      spark-pi-driver   1/1     Running   0          2m12s

      The output shows that the pod is in the Running state, which indicates that the Spark task is being deployed.

    3. Run the following command to view the deployment status of the Spark task again:
      kubectl get pod

      Expected output:

      NAME              READY   STATUS      RESTARTS   AGE
      spark-pi-driver   0/1     Completed   0          2m54s

      The output shows that the pod is in the Completed state, which indicates that the Spark task is deployed.

  4. Run the following command to view the computing result of the Spark task:
    kubectl logs spark-pi-driver|grep Pi

    Expected output:

    20/04/30 07:27:51 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 11.031 s
    20/04/30 07:27:51 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 11.137920 s
    Pi is roughly 3.1414371514143715
  5. Optional:To use a preemptible instance, add annotations for preemptible instances to the pod.
    For more information about how to add annotations for preemptible instances, see Use preemptible instances.