Use Spark History Server to view information about Spark jobs - Container Service for Kubernetes

When a Spark application is running, it provides a web UI that can be used to visualize Spark job information, including detailed stage and task information and memory usage. To view the execution of a Spark job after the job is completed, you can persist the logs of the Spark job to a backend storage system. Then, use the ack-spark-history-server component to parse and render the logs as a publicly accessible web UI. This topic describes how to use Spark History Server to view the information about Spark jobs in a Container Service for Kubernetes (ACK) cluster.

Prerequisites

The ack-spark-operator component is installed. For more information, see Step 1: Install the ack-spark-operator component.
A kubectl client is connected to the ACK cluster. For more information, see Get a cluster kubeconfig and connect to the cluster using kubectl.

Step 1: Install the ack-spark-history-server component

Log on to the ACK console. In the left-side navigation pane, choose Marketplace > Marketplace.
On the Marketplace page, click the App Catalog tab. Find and click ack-spark-history-server.
On the ack-spark-history-server page, click Deploy.
In the Deploy panel, select a cluster and namespace, and then click Next.

In the Parameters step, set the parameters and click OK.

The following section describes the log storage configuration, environment variables, and Service configuration. You can customize the configurations on demand. You can view the descriptions of the parameters in the Parameters section of the ack-spark-history-server page.

Note

When you install the ack-spark-history-server component, you must specify a backend storage system to store logs, such as Object Storage Service (OSS), persistent volume claim (PVC), or HDFS.

(Required) Configure a backend log storage system

You can use OSS, PVC, or HDFS to store logs. The following section describes the relevant configurations.

OSS

If you use OSS to store logs, you must configure the following parameters.

Important

You must wait until a Spark job to complete before you can upload the log of the Spark job. This means that you cannot view the log of a running Spark job in real time. You can view only the logs of Spark jobs that are completed.

Parameter	Description	Example
`spark.history.fs.logDirectory`	The URL of the log directory. Make sure that the log directory (such as `spark/spark-events`) already exists before you install the component. For more information about how to create a directory, see Manage directories.	`oss://<Bucket name>/spark/spark-events`
`storage.oss.enable`	Use OSS and OSS-HDFS to store logs.	`true`
`storage.oss.endpoint`	The OSS endpoint.	`oss-cn-beijing-internal.aliyuncs.com`
`storage.oss.existingSecret`	The name of the Secret that stores the credentials used to read data from OSS. We recommend that you specify an existing Secret. For more information about how to create a Secret to store credentials used to access OSS, see YAML template of Secrets that store OSS access credentials.	`spark-oss-secret`
`storage.oss.createSecret`	If no Secret is specified, the system automatically creates a Secret to store credentials used to access OSS.	None.
`storage.oss.createSecret.accessKeyId`	The AccessKey ID of your Alibaba Cloud account.	yourAccessKeyID
`storage.oss.createSecret.accessKeySecret`	The AccessKey secret of your Alibaba Cloud account.	yourAccessKeySecret

YAML template of Secrets that store OSS access credentials

Make sure that the Secret YAML file contains the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET fields.

Create a YAML file named spark-oss-secret.yaml and add the following content to the file:

apiVersion: v1
kind: Secret
metadata:
  name: spark-oss-secret
  namespace: spark-operator
stringData:
  OSS_ACCESS_KEY_ID: ""       # The AccessKey ID of your Alibaba Cloud account. 
  OSS_ACCESS_KEY_SECRET: ""   # The AccessKey secret of your Alibaba Cloud account.

Run the following command to create a Secret to store credentials used to access OSS:
```
kubectl apply -f spark-oss-secret.yaml
```

PVC

If you use a PVC to store logs, you must configure the following parameters.

Parameter	Description	Example
`spark.history.fs.logDirectory`	The URL of the log directory.	`file:///mnt/spark/spark-events`
`storage.pvc.enable`	Use a PVC to store logs.	`true`
`storage.pvc.name`	The name of an existing PVC. Make sure that the PVC and persistent volume (PV) are already created in the namespaces specified for `ack-spark-history-server` and the PVC is bound to the PV. For more information about how to create a PVC and bind the PVC to a PV, see Storage - CSI.	`"<PVC name>"`
`storage.pvc.mountPath`	The container path to which the PVC is mounted.	`"/mnt"`

HDFS

If you use HDFS to store logs, you must configure the following parameters.

Parameter	Description	Example
`spark.history.fs.logDirectory`	The URL of the log directory.	`hdfs://namenode:port/spark/spark-events`

(Optional) Configure the Service type and port
By default, the system creates a Service to expose the web UI of Spark History Server. You can configure the service.type and service.port parameters to specify the Service type and port.
Parameter
Description
Default value
service.type
The Service type. Valid values:
ClusterIP
NodePort
LoadBalancer
ClusterIP
service.port
The port used to access the web UI.
18080

(Optional) Configure environment variables

You can add environment variables to the env parameter to configure Spark History Server.

Environment variable	Description	Default value
`SPARK_DAEMON_MEMORY`	The amount of memory allocated to Spark History Server.	`1g`
`SPARK_DAEMON_JAVA_OPTS`	The JVM configuration of Spark History Server.	`""`
`SPARK_DAEMON_CLASSPATH`	The class path of Spark History Server.	`""`
`SPARK_PUBLIC_DNS`	The external endpoint of Spark History Server. If the external endpoint is not specified, application history is retrieved through the internal endpoint. In this case, connections may become invalid.	`""`
`SPARK_HISTORY_OPTS`	A group of `spark.history.*` configuration items.	`""`

You can add attributes to the sparkConf parameter to configure Spark History Server. The following table describes the commonly used attributes.

Attribute	Description	Default value
`spark.history.fs.update.interval`	The interval at which log update scans are performed.	`10s`
`spark.history.fs.retainedApplications`	The maximum number of application UIs to cache.	`50`
`spark.history.ui.port`	The port of Spark History Server.	`18080`
`spark.history.fs.cleaner.enabled`	Specify whether to periodically delete event logs.	`false`
`spark.history.fs.cleaner.interval`	When `spark.history.fs.cleaner.enabled=true` is set, event logs are deleted at the interval specified by this parameter.	`1d`
`spark.history.fs.cleaner.maxAge`	When `spark.history.fs.cleaner.enabled=true` is set, event logs older than the age specified by this parameter are deleted.	`7d`
`spark.history.fs.cleaner.maxNum`	When `spark.history.fs.cleaner.enabled=true` is set, this parameter specifies the maximum number of event logs to retain. When the number of event logs exceeds the threshold, Spark deletes the excessive event logs.	`Int.MaxValue`
`spark.history.fs.driverlog.cleaner.enabled`	Specify whether to periodically delete driver event logs.	`spark.history.fs.cleaner.enabled`
`spark.history.fs.driverlog.cleaner.interval`	When `spark.history.fs.driverlog.cleaner.enabled=true` is set, driver event logs are deleted at the interval specified by this parameter.	`spark.history.fs.cleaner.interval`
`spark.history.fs.driverlog.cleaner.maxAge`	When `spark.history.fs.driverlog.cleaner.enabled=true` is set, driver event logs older than the age specified by this parameter are deleted.	`spark.history.fs.cleaner.maxAge`
`spark.history.fs.numReplayThreads`	The number of threads created to handle log files.	25% of the available vCPUs.
`spark.history.store.maxDiskUsage`	The maximum amount of disk space used to cache application history.	`10g`

Step 2: Access the web UI of Spark History Server

The default Service type is ClusterIP. To access the web UI of Spark History Server, you must map the Service port to local port 18080. To do this, perform the following steps. To access the web UI through an existing Server Load Balancer (SLB) instance, refer to Use an existing Server Load Balancer instance to expose an application.

Important

Port forwarding configured by using the kubectl port-forward command is suitable only for testing environments but not suitable for production environments. Exercise caution when you use this command.

Run the following command to configure port forwarding:

RELEASE_NAME=spark-history-server
RELEASE_NAMESPACE=spark-operator

SERVICE_NAME=${RELEASE_NAME}-service
SERVICE_PORT=$(kubectl get service ${SERVICE_NAME} --namespace ${RELEASE_NAMESPACE} -o jsonpath="{.spec.ports[0].port}")

echo "Now you can go to http://127.0.0.1:18080 to visit spark history server."
kubectl port-forward --namespace ${RELEASE_NAMESPACE} services/${SERVICE_NAME} 18080:${SERVICE_PORT}

Expected output:

Now you can go to http://127.0.0.1:18080 to visit spark history server.
Forwarding from 127.0.0.1:18080 -> 18080
Forwarding from [::1]:18080 -> 18080

Use a web browser to access the web UI of Spark History Server through http://127.0.0.1:18080.

Step 3: Enable event logging for a Spark job

After ack-spark-history-server is installed, you need to enable event logging for your Spark job. Configure the following parameters to enable event logging and store event logs.

Parameter

Description

Example

spark.eventLog.enabled

Enable event logging. Valid values:

true
false

true

spark.eventLog.dir

The path of event logs. Valid values:

oss://<Bucket name>/spark/spark-events (OSS path)
hdfs://namenode:port/spark/spark-events (HDFS path)
file:///tmp/spark/spark-events (local path)

oss://<Bucket name>/spark/spark-events

Use scenario: Store the logs of a Spark job in OSS

The following example shows how to use OSS to store the logs of a Spark job.

Build a Spark container image
Container images provided by open source Spark do not contain JAR packages required for accessing OSS. Therefore, you need to manually build a Spark container image and add JAR packages related to the Hadoop OSS SDK to the image. The following code block shows a sample Dockerfile. For more information about how to use Container Registry to build a container image, see Use a Container Registry Enterprise Edition instance to build an image. Build an image from the Dockerfile and push the image to your image repository. Make sure that the JAR packages of the Hadoop version required by Spark are added to the image.
```
ARG SPARK_IMAGE=registry-cn-hangzhou.ack.aliyuncs.com/dev/spark:3.5.2

FROM ${SPARK_IMAGE}

# Add dependency for Hadoop Aliyun OSS support
ADD --chmod=644 https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aliyun/3.3.4/hadoop-aliyun-3.3.4.jar ${SPARK_HOME}/jars
ADD --chmod=644 https://repo1.maven.org/maven2/com/aliyun/oss/aliyun-sdk-oss/3.17.4/aliyun-sdk-oss-3.17.4.jar ${SPARK_HOME}/jars
ADD --chmod=644 https://repo1.maven.org/maven2/org/jdom/jdom2/2.0.6.1/jdom2-2.0.6.1.jar ${SPARK_HOME}/jars
```

Create a Secret

Create a Secret in the namespace of the Spark job to store credentials used to access OSS.

Create a Secret YAML file named spark-oss-secret.yaml to store credentials used to access OSS.

apiVersion: v1
kind: Secret
metadata:
  name: spark-oss-secret
  namespace: default
stringData:
  # The AccessKey ID of your Alibaba Cloud account.
  OSS_ACCESS_KEY_ID: ""
  # The AccessKey secret of your Alibaba Cloud account.
  OSS_ACCESS_KEY_SECRET: ""

Run the following command to create a Secret:

kubectl apply -f spark-oss-secret.yaml

Expected output:

secret/spark-oss-secret created

Submit a Spark job

The following code block shows a sample SparkApplication that has event logging enabled. You can modify the parameters on demand.

Parameter	Description	Example
`image`	The address of the Spark container image.	`registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2-oss`
`fs.oss.endpoint`	The OSS endpoint.	`oss-cn-beijing-internal.aliyuncs.com`
`spark.eventLog.dir`	The path of event logs. Make sure that the log path already exists. Otherwise, the job throws an error when it runs.	`oss://<Bucket name>/spark/spark-events`

Create a SparkApplication YAML file named spark-pi.yaml.

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi-oss
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/spark:3.5.2-oss
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar
  mainClass: org.apache.spark.examples.SparkPi
  sparkVersion: 3.5.2
  hadoopConf:
    fs.AbstractFileSystem.oss.impl: org.apache.hadoop.fs.aliyun.oss.OSS
    fs.oss.impl: org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem
    # The OSS endpoint.
    fs.oss.endpoint: oss-cn-beijing-internal.aliyuncs.com
    fs.oss.credentials.provider: com.aliyun.oss.common.auth.EnvironmentVariableCredentialsProvider
  sparkConf:
    spark.eventLog.enabled: "true"
    # The path of event logs.
    spark.eventLog.dir: oss://<Bucket name>/spark/spark-events
  driver:
    cores: 1
    coreLimit: 1200m
    memory: 512m
    serviceAccount: spark-operator-spark
    envFrom:
    - secretRef:
        name: spark-oss-secret
  executor:
    instances: 1
    cores: 1
    coreLimit: 1200m
    memory: 512m
    envFrom:
    - secretRef:
        name: spark-oss-secret
  restartPolicy:
    type: Never

Run the following command to submit the Spark job. After the image and Secret are created, enter http://127.0.0.1:18080 into the address bar of a web browser to view information about the Spark job.
```
kubectl apply -f spark-pi.yaml
```
Expected output:
```
sparkapplication.sparkoperator.k8s.io/spark-pi created
```