In a serverless Kubernetes (ASK) cluster, you can create pods as needed. The system
stops billing after a pod is stopped and you do not need to reserve computing resources
to handle Spark tasks. This resolves the issues of insufficient computing resources
and saves you the need to expand the cluster. In addition, you can reduce the computing
costs by using preemptible instances. This topic describes how to use ASK to create
Spark tasks to meet your business requirements.
Procedure
- Deploy the ack-spark-operator chart by using one of the following methods:
After you deploy the chart, run the following command to check whether spark-operator
is started:
kubectl -n spark-operator get pod
NAME READY STATUS RESTARTS AGE
ack-spark-operator-7698586d7b-pvwln 1/1 Running 0 5m9s
ack-spark-operator-init-26tvh 0/1 Completed 0 5m9s
- Create the spark-pi.yaml file and copy the following content into the file:
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: spark-pi
namespace: default
spec:
arguments:
- "1000"
sparkConf:
"spark.scheduler.maxRegisteredResourcesWaitingTime": "3000s"
"spark.kubernetes.allocation.batch.size": "1"
"spark.rpc.askTimeout": "36000s"
"spark.network.timeout": "36000s"
"spark.rpc.lookupTimeout": "36000s"
"spark.core.connection.ack.wait.timeout": "36000s"
"spark.executor.heartbeatInterval": "10000s"
type: Scala
mode: cluster
image: "registry.cn-shenzhen.aliyuncs.com/ringtail/spark-pi:0.4"
imagePullPolicy: Always
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar"
sparkVersion: "2.4.5"
restartPolicy:
type: Never
args:
driver:
cores: 4
coreLimit: "4"
annotations:
k8s.aliyun.com/eci-image-cache: "true"
memory: "6g"
memoryOverhead: "2g"
labels:
version: 2.4.5
serviceAccount: spark
executor:
annotations:
k8s.aliyun.com/eci-image-cache: "true"
cores: 2
instances: 1
memory: "3g"
memoryOverhead: "1g"
labels:
version: 2.4.5
- Run the following command to deploy a Spark task:
kubectl apply -f spark-pi.yaml
- Run the following commands to check the state of the pod.
Run the following command to deploy a Spark task:
kubectl apply -f spark-pi.yaml
sparkapplication.sparkoperator.k8s.io/spark-pi created
Run the following command to query the pod:
kubectl get pod
NAME READY STATUS RESTARTS AGE
spark-pi-driver 1/1 Running 0 2m12s
Run the following command to query the pod:
kubectl get pod
NAME READY STATUS RESTARTS AGE
spark-pi-driver 0/1 Completed 0 2m54s
- Run the following command to view the computing result of the Spark task:
kubectl logs spark-pi-driver|grep Pi
20/04/30 07:27:51 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 11.031 s
20/04/30 07:27:51 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 11.137920 s
Pi is roughly 3.1414371514143715
- Optional:To use a preemptible instance, add a preemptible instance annotation to the pod.