When a Spark application is running, it provides a web UI that can be used to visualize Spark job information, including detailed stage and task information and memory usage. To view the execution of a Spark job after the job is completed, you can persist the logs of the Spark job to a backend storage system. Then, use the ack-spark-history-server component to parse and render the logs as a publicly accessible web UI. This topic describes how to use Spark History Server to view the information about Spark jobs in a Container Service for Kubernetes (ACK) cluster.
Prerequisites
The ack-spark-operator component is installed. For more information, see Step 1: Install the ack-spark-operator component.
A kubectl client is connected to the ACK cluster. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.
Step 1: Install the ack-spark-history-server component
Log on to the ACK console. In the left-side navigation pane, choose .
On the Marketplace page, click the App Catalog tab. Find and click ack-spark-history-server.
On the ack-spark-history-server page, click Deploy.
In the Deploy panel, select a cluster and namespace, and then click Next.
In the Parameters step, set the parameters and click OK.
The following section describes the log storage configuration, environment variables, and Service configuration. You can customize the configurations on demand. You can view the descriptions of the parameters in the Parameters section of the ack-spark-history-server page.
NoteWhen you install the ack-spark-history-server component, you must specify a backend storage system to store logs, such as Object Storage Service (OSS), persistent volume claim (PVC), or HDFS.
(Required) Configure a backend log storage system
You can use OSS, PVC, or HDFS to store logs. The following section describes the relevant configurations.
OSS
If you use OSS to store logs, you must configure the following parameters.
ImportantYou must wait until a Spark job to complete before you can upload the log of the Spark job. This means that you cannot view the log of a running Spark job in real time. You can view only the logs of Spark jobs that are completed.
Parameter
Description
Example
spark.history.fs.logDirectoryThe URL of the log directory. Make sure that the log directory (such as
spark/spark-events) already exists before you install the component. For more information about how to create a directory, see Manage directories.oss://<Bucket name>/spark/spark-eventsstorage.oss.enableUse OSS and OSS-HDFS to store logs.
truestorage.oss.endpointThe OSS endpoint.
oss-cn-beijing-internal.aliyuncs.comstorage.oss.existingSecretThe name of the Secret that stores the credentials used to read data from OSS. We recommend that you specify an existing Secret. For more information about how to create a Secret to store credentials used to access OSS, see YAML template of Secrets that store OSS access credentials.
spark-oss-secretstorage.oss.createSecretIf no Secret is specified, the system automatically creates a Secret to store credentials used to access OSS.
None.
storage.oss.createSecret.accessKeyIdThe AccessKey ID of your Alibaba Cloud account.
yourAccessKeyID
storage.oss.createSecret.accessKeySecretThe AccessKey secret of your Alibaba Cloud account.
yourAccessKeySecret
PVC
If you use a PVC to store logs, you must configure the following parameters.
Parameter
Description
Example
spark.history.fs.logDirectoryThe URL of the log directory.
file:///mnt/spark/spark-eventsstorage.pvc.enableUse a PVC to store logs.
truestorage.pvc.nameThe name of an existing PVC. Make sure that the PVC and persistent volume (PV) are already created in the namespaces specified for
ack-spark-history-serverand the PVC is bound to the PV. For more information about how to create a PVC and bind the PVC to a PV, see Storage - CSI."<PVC name>"storage.pvc.mountPathThe container path to which the PVC is mounted.
"/mnt"HDFS
If you use HDFS to store logs, you must configure the following parameters.
Parameter
Description
Example
spark.history.fs.logDirectoryThe URL of the log directory.
hdfs://namenode:port/spark/spark-events(Optional) Configure the Service type and port
By default, the system creates a Service to expose the web UI of Spark History Server. You can configure the
service.typeandservice.portparameters to specify the Service type and port.Parameter
Description
Default value
service.typeThe Service type. Valid values:
ClusterIPNodePortLoadBalancer
ClusterIPservice.portThe port used to access the web UI.
18080(Optional) Configure environment variables
You can add environment variables to the
envparameter to configure Spark History Server.Environment variable
Description
Default value
SPARK_DAEMON_MEMORYThe amount of memory allocated to Spark History Server.
1gSPARK_DAEMON_JAVA_OPTSThe JVM configuration of Spark History Server.
""SPARK_DAEMON_CLASSPATHThe class path of Spark History Server.
""SPARK_PUBLIC_DNSThe external endpoint of Spark History Server. If the external endpoint is not specified, application history is retrieved through the internal endpoint. In this case, connections may become invalid.
""SPARK_HISTORY_OPTSA group of
spark.history.*configuration items.""You can add attributes to the
sparkConfparameter to configure Spark History Server. The following table describes the commonly used attributes.Attribute
Description
Default value
spark.history.fs.update.intervalThe interval at which log update scans are performed.
10sspark.history.fs.retainedApplicationsThe maximum number of application UIs to cache.
50spark.history.ui.portThe port of Spark History Server.
18080spark.history.fs.cleaner.enabledSpecify whether to periodically delete event logs.
falsespark.history.fs.cleaner.intervalWhen
spark.history.fs.cleaner.enabled=trueis set, event logs are deleted at the interval specified by this parameter.1dspark.history.fs.cleaner.maxAgeWhen
spark.history.fs.cleaner.enabled=trueis set, event logs older than the age specified by this parameter are deleted.7dspark.history.fs.cleaner.maxNumWhen
spark.history.fs.cleaner.enabled=trueis set, this parameter specifies the maximum number of event logs to retain. When the number of event logs exceeds the threshold, Spark deletes the excessive event logs.Int.MaxValuespark.history.fs.driverlog.cleaner.enabledSpecify whether to periodically delete driver event logs.
spark.history.fs.cleaner.enabledspark.history.fs.driverlog.cleaner.intervalWhen
spark.history.fs.driverlog.cleaner.enabled=trueis set, driver event logs are deleted at the interval specified by this parameter.spark.history.fs.cleaner.intervalspark.history.fs.driverlog.cleaner.maxAgeWhen
spark.history.fs.driverlog.cleaner.enabled=trueis set, driver event logs older than the age specified by this parameter are deleted.spark.history.fs.cleaner.maxAgespark.history.fs.numReplayThreadsThe number of threads created to handle log files.
25% of the available vCPUs.
spark.history.store.maxDiskUsageThe maximum amount of disk space used to cache application history.
10g
Step 2: Access the web UI of Spark History Server
The default Service type is ClusterIP. To access the web UI of Spark History Server, you must map the Service port to local port 18080. To do this, perform the following steps. To access the web UI through an existing Server Load Balancer (SLB) instance, refer to Use an existing SLB instance to expose an application.
Port forwarding configured by using the kubectl port-forward command is suitable only for testing environments but not suitable for production environments. Exercise caution when you use this command.
Run the following command to configure port forwarding:
RELEASE_NAME=spark-history-server RELEASE_NAMESPACE=spark-operator SERVICE_NAME=${RELEASE_NAME}-service SERVICE_PORT=$(kubectl get service ${SERVICE_NAME} --namespace ${RELEASE_NAMESPACE} -o jsonpath="{.spec.ports[0].port}") echo "Now you can go to http://127.0.0.1:18080 to visit spark history server." kubectl plort-forward --namespace ${RELEASE_NAMESPACE} services/${SERVICE_NAME} 18080:${SERVICE_PORT}Expected output:
Now you can go to http://127.0.0.1:18080 to visit spark history server. Forwarding from 127.0.0.1:18080 -> 18080 Forwarding from [::1]:18080 -> 18080Use a web browser to access the web UI of Spark History Server through http://127.0.0.1:18080.

Step 3: Enable event logging for a Spark job
After ack-spark-history-server is installed, you need to enable event logging for your Spark job. Configure the following parameters to enable event logging and store event logs.
Parameter | Description | Example |
| Enable event logging. Valid values:
|
|
| The path of event logs. Valid values:
|
|
Use scenario: Store the logs of a Spark job in OSS
The following example shows how to use OSS to store the logs of a Spark job.
Build a Spark container image
Container images provided by open source Spark do not contain JAR packages required for accessing OSS. Therefore, you need to manually build a Spark container image and add JAR packages related to the Hadoop OSS SDK to the image. The following code block shows a sample Dockerfile. For more information about how to use Container Registry to build a container image, see Use a Container Registry Enterprise Edition instance to build an image. Build an image from the Dockerfile and push the image to your image repository. Make sure that the JAR packages of the Hadoop version required by Spark are added to the image.
ARG SPARK_IMAGE=registry-cn-hangzhou.ack.aliyuncs.com/dev/spark:3.5.2 FROM ${SPARK_IMAGE} # Add dependency for Hadoop Aliyun OSS support ADD --chmod=644 https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aliyun/3.3.4/hadoop-aliyun-3.3.4.jar ${SPARK_HOME}/jars ADD --chmod=644 https://repo1.maven.org/maven2/com/aliyun/oss/aliyun-sdk-oss/3.17.4/aliyun-sdk-oss-3.17.4.jar ${SPARK_HOME}/jars ADD --chmod=644 https://repo1.maven.org/maven2/org/jdom/jdom2/2.0.6.1/jdom2-2.0.6.1.jar ${SPARK_HOME}/jarsCreate a Secret
Create a Secret in the namespace of the Spark job to store credentials used to access OSS.
Create a Secret YAML file named
spark-oss-secret.yamlto store credentials used to access OSS.apiVersion: v1 kind: Secret metadata: name: spark-oss-secret namespace: default stringData: # The AccessKey ID of your Alibaba Cloud account. OSS_ACCESS_KEY_ID: "" # The AccessKey secret of your Alibaba Cloud account. OSS_ACCESS_KEY_SECRET: ""Run the following command to create a Secret:
kubectl apply -f spark-oss-secret.yamlExpected output:
secret/spark-oss-secret created
Submit a Spark job
The following code block shows a sample SparkApplication that has event logging enabled. You can modify the parameters on demand.
Parameter
Description
Example
imageThe address of the Spark container image.
registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2-ossfs.oss.endpointThe OSS endpoint.
oss-cn-beijing-internal.aliyuncs.comspark.eventLog.dirThe path of event logs. Make sure that the log path already exists. Otherwise, the job throws an error when it runs.
oss://<Bucket name>/spark/spark-eventsCreate a SparkApplication YAML file named
spark-pi.yaml.apiVersion: sparkoperator.k8s.io/v1beta2 kind: SparkApplication metadata: name: spark-pi-oss namespace: default spec: type: Scala mode: cluster image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2-oss mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar mainClass: org.apache.spark.examples.SparkPi sparkVersion: 3.5.2 hadoopConf: fs.AbstractFileSystem.oss.impl: org.apache.hadoop.fs.aliyun.oss.OSS fs.oss.impl: org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem # The OSS endpoint. fs.oss.endpoint: oss-cn-beijing-internal.aliyuncs.com fs.oss.credentials.provider: com.aliyun.oss.common.auth.EnvironmentVariableCredentialsProvider sparkConf: spark.eventLog.enabled: "true" # The path of event logs. spark.eventLog.dir: oss://<Bucket name>/spark/spark-events driver: cores: 1 coreLimit: 1200m memory: 512m serviceAccount: spark-operator-spark envFrom: - secretRef: name: spark-oss-secret executor: instances: 1 cores: 1 coreLimit: 1200m memory: 512m envFrom: - secretRef: name: spark-oss-secret restartPolicy: type: NeverRun the following command to submit the Spark job. After the image and Secret are created, enter http://127.0.0.1:18080 into the address bar of a web browser to view information about the Spark job.
kubectl apply -f spark-pi.yamlExpected output:
sparkapplication.sparkoperator.k8s.io/spark-pi created