ACK Serverless runs Spark tasks as on-demand Pods—billing stops when the Pod lifecycle ends, so you don't need to reserve computing resources or expand the cluster. To cut costs further, use preemptible instances.
Prerequisites
Before you begin, ensure that you have:
-
An ACK Serverless cluster. See Create an ACK Serverless cluster.
-
A kubectl client connected to the cluster. See Connect to an ACK cluster by using kubectl.
Deploy the spark-operator
Deploy the ack-spark-operator Helm chart using one of the following methods.
Option 1: ACK console
-
Log on to the Container Service Management ConsoleContainer Service for Kubernetes (ACK) console.
-
In the left-side navigation pane, choose Marketplace > Marketplace.
-
Search for and select ack-spark-operator, then deploy the chart.
Option 2: Helm CLI (Helm V3 or later required)
Run the following commands:
# Create a service account
kubectl create serviceaccount spark
# Grant permissions
kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
# Add the Helm repository and install the operator
helm repo add aliyunhub https://aliacs-app-catalog.oss-cn-hangzhou.aliyuncs.com/charts-incubator/
helm install ack-spark-operator aliyunhub/ack-spark-operator
After deployment, verify that the spark-operator is running:
kubectl -n spark-operator get pod
Expected output:
NAME READY STATUS RESTARTS AGE
ack-spark-operator-7698586d7b-pvwln 1/1 Running 0 5m9s
ack-spark-operator-init-26tvh 0/1 Completed 0 5m9s
Run a Spark task
This section walks through deploying the built-in SparkPi example, which estimates the value of pi using Monte Carlo sampling.
Step 1: Create the SparkApplication manifest
Create a file named spark-pi.yaml with the following content:
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: spark-pi
namespace: default
spec:
arguments:
- "1000"
sparkConf:
"spark.scheduler.maxRegisteredResourcesWaitingTime": "3000s"
"spark.kubernetes.allocation.batch.size": "1"
"spark.rpc.askTimeout": "36000s"
"spark.network.timeout": "36000s"
"spark.rpc.lookupTimeout": "36000s"
"spark.core.connection.ack.wait.timeout": "36000s"
"spark.executor.heartbeatInterval": "10000s"
type: Scala
mode: cluster
image: "registry.aliyuncs.com/acs/spark:ack-2.4.5-latest"
imagePullPolicy: Always
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar"
sparkVersion: "2.4.5"
restartPolicy:
type: Never
driver:
cores: 4
coreLimit: "4"
annotations:
k8s.aliyun.com/eci-image-cache: "true"
memory: "6g"
memoryOverhead: "2g"
labels:
version: 2.4.5
serviceAccount: spark
executor:
annotations:
k8s.aliyun.com/eci-image-cache: "true"
cores: 2
instances: 1
memory: "3g"
memoryOverhead: "1g"
labels:
version: 2.4.5
Step 2: Submit the task
kubectl apply -f spark-pi.yaml
Expected output:
sparkapplication.sparkoperator.k8s.io/spark-pi created
Step 3: Check task status
Run the following command to view the deployment status of the Spark task:
kubectl get pod
Expected output when the task is in progress:
NAME READY STATUS RESTARTS AGE
spark-pi-driver 1/1 Running 0 2m12s
The pod is in the Running state, which indicates that the Spark task is being deployed.
Run the command again to check the final status:
kubectl get pod
Expected output when the task completes:
NAME READY STATUS RESTARTS AGE
spark-pi-driver 0/1 Completed 0 2m54s
The pod is in the Completed state, which indicates that the Spark task is deployed.
Step 4: View the result
kubectl logs spark-pi-driver | grep Pi
Expected output:
20/04/30 07:27:51 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 11.031 s
20/04/30 07:27:51 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 11.137920 s
Pi is roughly 3.1414371514143715
(Optional) Use preemptible instances
Add annotations for preemptible instances to the pod to reduce computing costs. For details, see Use preemptible instances.