All Products
Search
Document Center

Container Service for Kubernetes:Create Spark jobs

Last Updated:Apr 12, 2024

You can create Spark jobs on Fleet instances of Distributed Cloud Container Platform for Kubernetes (ACK One) in the same way you create Spark jobs in individual Kubernetes clusters. After you create a Spark job on a Fleet instance, the Fleet instance dynamically schedules the job to a cluster that is associated with the Fleet instance and has sufficient resources to meet the resource request of the job. This topic describes how to create a Spark job and query the status of the job. This topic describes how to create a Spark job and query the status of the job.

Prerequisites

  • By default, the Spark Application CustomResourceDefinition (CRD) is created for Spark Operator. The API version supported by the Spark Application CRD is sparkoperator.k8s.io/v1beta2.

  • Run the following command to query the Spark Application CRD:

    kubectl get crd sparkapplications.sparkoperator.k8s.io
  • To customize the Spark Application CRD, run the following command to modify the sparkoperator.k8s.io_sparkapplications.yaml file:

    kubectl apply -f manifest/crds/sparkoperator.k8s.io_sparkapplications.yaml
  • Spark Operator is installed in all the clusters that are associated with the Fleet instance. For more information, see Step 1: Install Spark Operator.

  • The kubeconfig file of the Fleet instance is obtained in the Distributed Cloud Container Platform for Kubernetes (ACK One) console and a kubectl client is connected to the Fleet instance.

  • The AMC command-line tool is installed. For more information, see Use AMC.

Step 1: Install Spark Operator

  1. Log on to the Container Service for Kubernetes (ACK) console.

  2. In the left-side navigation pane of the ACK console, choose Marketplace > Marketplace.

  3. On the Marketplace page, click the App Catalog tab. Find and click ack-spark-operator.

  4. On the ack-spark-operator page, click Deploy.

  5. In the Deploy wizard, select a cluster and a namespace, and then click Next.

  6. On the Parameters wizard page, set the sparkJobNamespace field to "". Then, click OK.

Step 2: Create a Spark job and check its status

  1. Use the following YAML template to create a Spark job on the Fleet instance.

    In this example, the job is named pi and is created in the demo namespace.

    apiVersion: "sparkoperator.k8s.io/v1beta2"
    kind: SparkApplication
    metadata:
      name: pi
      namespace: demo
    spec:
      type: Scala
      mode: cluster
      image: "acr-multiple-clusters-registry.cn-hangzhou.cr.aliyuncs.com/ack-multiple-clusters/spark:v3.1.1"
      imagePullPolicy: Always
      mainClass: org.apache.spark.examples.SparkPi
      mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar"
      sparkVersion: "3.1.1"
      restartPolicy:
        type: Never
      volumes:
        - name: "test-volume"
          hostPath:
            path: "/tmp"
            type: Directory
      driver:
        cores: 1
        coreLimit: "1200m"
        memory: "512m"
        labels:
          version: 3.1.1
        serviceAccount: spark
        volumeMounts:
          - name: "test-volume"
            mountPath: "/tmp"
      executor:
        cores: 1
        instances: 3
        memory: "512m"
        labels:
          version: 3.1.1
        volumeMounts:
          - name: "test-volume"
            mountPath: "/tmp"
  2. Run the following command on the Fleet instance to query the scheduling result of the Spark job.

    If no output is returned, the job failed to be scheduled. In this case, check whether the specified namespace exists and whether you have a sufficient namespace quota. If the specified namespace does not exist or the namespace quota of your account is exhausted, the job remains in the pending state.

    kubectl get sparkapplication pi -n demo -o jsonpath='{.metadata.annotations.scheduling\.x-k8s\.io/placement}'
  3. Check the status of the Spark job.

    • Run the following command on the Fleet instance to query the status of the job:

      kubectl get sparkapplication pi -n demo

      Expected output:

      NAME   STATUS      ATTEMPTS   START    FINISH    AGE
      pi     COMPLETED   1          ***      ***       ***
    • Run the following command on the Fleet instance to query the status of the pod that runs the job:

      kubectl amc get pod -j sparkapplication/pi -n demo

      Expected output:

      Run on ManagedCluster managedcluster-c1***e5
      NAME        READY   STATUS      RESTARTS   AGE
      pi-driver  0/1     Completed   0          ***
    • Run the following command to print the logs of the pod:

      kubectl amc logs pi-driver  -j sparkapplication/pi -n demo

      Expected output:

      Run on ManagedCluster managedcluster-c1***e5
      ...
      Pi is roughly 3.144875724378622
      ...