Centralize Spark Job Logs in ACK with Simple Log Service - Container Service for Kubernetes

Spark jobs running in a Container Service for Kubernetes (ACK) cluster generate logs distributed across many pods, making centralized log management difficult. By integrating with Simple Log Service (SLS), you can:

Collect structured logs from all Spark driver and executor pods automatically
Query and analyze Spark job logs across a specified time range
Filter logs by application name, version, role, and submission ID

This topic describes how to configure the full log collection pipeline: building a Spark container image with structured logging support, deploying a Logtail configuration to collect pod logs, and querying the results in Simple Log Service.

Prerequisites

Before you begin, ensure that you have:

The ack-spark-operator component installed. For more information, see Step 1: Install the ack-spark-operator component.
A Simple Log Service project created. For more information, see Manage a project.
Logtail components installed in your ACK cluster. For more information, see Install Logtail components in an ACK cluster.
A container image registry to push your custom Spark image.

How it works

The pipeline works as follows:

A custom Spark container image adds the Log4j2 JSON template layout library, enabling structured JSONL output.
A ConfigMap configures Log4j2 to write logs in JSONL format using the Elastic Common Schema (ECS) template, both to stdout and to a file at /opt/spark/logs/spark.log.
A Logtail configuration (AliyunLogConfig) tells Simple Log Service to collect logs from containers matching the Spark Operator pod label and container name pattern, then parses the JSONL fields and extracts timestamps.
After a Spark job runs, its logs are available in the specified Logstore for querying and analysis.

Step 1: Build a Spark container image

Create the following Dockerfile. This example uses Spark 3.5.3 and adds the log4j-layout-template-json dependency to the Spark classpath, which enables JSONL output via the JsonTemplateLayout appender.

ARG SPARK_IMAGE=<SPARK_IMAGE>  # Replace <SPARK_IMAGE> with your Spark base image.

FROM ${SPARK_IMAGE}

# Add dependency for log4j-layout-template-json
ADD --chown=spark:spark --chmod=644 https://repo1.maven.org/maven2/org/apache/logging/log4j/log4j-layout-template-json/2.24.1/log4j-layout-template-json-2.24.1.jar ${SPARK_HOME}/jars

Build and push the image to your registry:

docker build -t <your-registry>/<your-image-name>:<tag> .
docker push <your-registry>/<your-image-name>:<tag>

Replace the placeholders with your actual registry and image details.

Step 2: Configure Log4j2 logs

Create a file named spark-log-conf.yaml with the following content. This ConfigMap sets the log level to INFO and configures both the console and file appenders to output logs in JSONL format using the ECS template. For more information, see Collect Log4j logs.

apiVersion: v1
kind: ConfigMap
metadata:
  name: spark-log-conf
  namespace: default
data:
  log4j2.properties: |
    # Set everything to be logged to the console and file
    rootLogger.level = info

    rootLogger.appenderRefs = console, file
    rootLogger.appenderRef.console.ref = STDOUT
    rootLogger.appenderRef.file.ref = FileAppender

    appender.console.name = STDOUT
    appender.console.type = Console
    appender.console.layout.type = JsonTemplateLayout
    appender.console.layout.eventTemplateUri = classpath:EcsLayout.json

    appender.file.name = FileAppender
    appender.file.type = File
    appender.file.fileName = /opt/spark/logs/spark.log
    appender.file.layout.type = JsonTemplateLayout
    appender.file.layout.eventTemplateUri = classpath:EcsLayout.json

Apply the ConfigMap:

kubectl apply -f spark-log-conf.yaml

Expected output:

configmap/spark-log-conf created

Step 3: Create a Logtail configuration

Create a file named aliyun-log-config.yaml with the following content. Replace <SLS_PROJECT> with your Simple Log Service project name and <SLS_LOGSTORE> with your Logstore name. If the Logstore does not exist, Simple Log Service creates it automatically.

The configuration filters for pods launched by the Spark Operator using the label sparkoperator.k8s.io/launched-by-spark-operator: "true", which the Spark Operator sets automatically on all driver and executor pods. Logs are collected from /opt/spark/logs, parsed as JSON, and the @timestamp field is extracted as the log time. For more information about AliyunLogConfig fields, see Use AliyunLogConfig to manage a Logtail configuration.

apiVersion: log.alibabacloud.com/v1alpha1
kind: AliyunLogConfig
metadata:
  name: spark
  namespace: default
spec:
  # (Optional) The name of the project. Default value: k8s-log-<Your_Cluster_ID>.
  project: <SLS_PROJECT>

  # The name of the Logstore. If the Logstore that you specify does not exist, Simple Log Service automatically creates a Logstore.
  logstore: <SLS_LOGSTORE>

  # The Logtail configuration.
  logtailConfig:
    # The name of the Logtail configuration.
    configName: spark

    # The type of the data source. The value file specifies text logs.
    inputType: file

    # The configurations of the log input.
    inputDetail:
      # The directory in which the log file is located.
      logPath: /opt/spark/logs

      # The name of a log file. Wildcard characters are supported.
      filePattern: '*.log'

      # The encoding of the log file.
      fileEncoding: utf8

      # The log type.
      logType: json_log
      localStorage: true
      key:
      - content
      logBeginRegex: .*
      logTimezone: ''
      discardNonUtf8: false
      discardUnmatch: true
      preserve: true
      preserveDepth: 0
      regex: (.*)
      outputType: LogService
      topicFormat: none
      adjustTimezone: false
      enableRawLog: false

      # Collect text logs from containers.
      dockerFile: true

      # Advanced configurations.
      advanced:
        # Preview the container metadata.
        collect_containers_flag: true

        # Logtail configurations in Kubernetes.
        k8s:
          # Filter pods based on the tag.
          IncludeK8sLabel:
            sparkoperator.k8s.io/launched-by-spark-operator: "true"

          # Filter containers based on the container name.
          K8sContainerRegex: "^spark-kubernetes-(driver|executor)$"

          # Additional log tag configurations.
          ExternalK8sLabelTag:
            spark-app-name: spark-app-name
            spark-version: spark-version
            spark-role: spark-role
            spark-app-selector: spark-app-selector
            sparkoperator.k8s.io/submission-id: sparkoperator.k8s.io/submission-id

      # The log processing plug-in.
      plugin:
        processors:
        # Log isolation.
        - type: processor_split_log_string
          detail:
            SplitKey: content
            SplitSep: ''

        # Parse the JSON field.
        - type: processor_json
          detail:
            ExpandArray: false
            ExpandConnector: ''
            ExpandDepth: 0
            IgnoreFirstConnector: false
            SourceKey: content
            KeepSource: false
            KeepSourceIfParseError: true
            NoKeyError: false
            UseSourceKeyAsPrefix: false

        # Extract the log timestamp.
        - type: processor_strptime
          detail:
            SourceKey: '@timestamp'
            Format: '%Y-%m-%dT%H:%M:%S.%fZ'
            KeepSource: false
            AdjustUTCOffset: true
            UTCOffset: 0
            AlarmIfFail: false

Apply the configuration:

kubectl apply -f aliyun-log-config.yaml

To verify that the Logstore and Logtail configuration were created:

Log on to the Simple Log Service console.
In the Projects section, click your project.
Choose Log Storage \> Logstores. Click the \> icon next to the target Logstore. Choose Data Import \> Logtail Configurations.
Click the Logtail configuration to view its details.

Step 4: Submit a sample Spark job

Create a file named spark-pi.yaml with the following content. The sparkConfigMap field references the ConfigMap created in Step 2, which injects the Log4j2 configuration into the Spark pods.

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: <SPARK_IMAGE>
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.3.jar
  arguments:
  - "5000"
  sparkVersion: 3.5.3
  sparkConfigMap: spark-log-conf
  driver:
    cores: 1
    memory: 512m
    serviceAccount: spark-operator-spark
  executor:
    instances: 1
    cores: 1
    memory: 4g

Submit the job:

kubectl apply -f spark-pi.yaml

After the job completes, check the last 10 lines of the driver pod log to confirm JSONL output:

kubectl logs --tail=10 spark-pi-driver

Expected output:

{"@timestamp":"2024-11-20T11:45:48.487Z","ecs.version":"1.2.0","log.level":"WARN","message":"Kubernetes client has been closed.","process.thread.name":"-937428334-pool-19-thread-1","log.logger":"org.apache.spark.scheduler.cluster.k8s.ExecutorPodsWatchSnapshotSource"}
{"@timestamp":"2024-11-20T11:45:48.585Z","ecs.version":"1.2.0","log.level":"INFO","message":"MapOutputTrackerMasterEndpoint stopped!","process.thread.name":"dispatcher-event-loop-7","log.logger":"org.apache.spark.MapOutputTrackerMasterEndpoint"}
{"@timestamp":"2024-11-20T11:45:48.592Z","ecs.version":"1.2.0","log.level":"INFO","message":"MemoryStore cleared","process.thread.name":"main","log.logger":"org.apache.spark.storage.memory.MemoryStore"}
{"@timestamp":"2024-11-20T11:45:48.592Z","ecs.version":"1.2.0","log.level":"INFO","message":"BlockManager stopped","process.thread.name":"main","log.logger":"org.apache.spark.storage.BlockManager"}
{"@timestamp":"2024-11-20T11:45:48.596Z","ecs.version":"1.2.0","log.level":"INFO","message":"BlockManagerMaster stopped","process.thread.name":"main","log.logger":"org.apache.spark.storage.BlockManagerMaster"}
{"@timestamp":"2024-11-20T11:45:48.598Z","ecs.version":"1.2.0","log.level":"INFO","message":"OutputCommitCoordinator stopped!","process.thread.name":"dispatcher-event-loop-1","log.logger":"org.apache.spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint"}
{"@timestamp":"2024-11-20T11:45:48.602Z","ecs.version":"1.2.0","log.level":"INFO","message":"Successfully stopped SparkContext","process.thread.name":"main","log.logger":"org.apache.spark.SparkContext"}
{"@timestamp":"2024-11-20T11:45:48.604Z","ecs.version":"1.2.0","log.level":"INFO","message":"Shutdown hook called","process.thread.name":"shutdown-hook-0","log.logger":"org.apache.spark.util.ShutdownHookManager"}
{"@timestamp":"2024-11-20T11:45:48.604Z","ecs.version":"1.2.0","log.level":"INFO","message":"Deleting directory /var/data/spark-f783cf2e-44db-452c-83c9-738f9c894ef9/spark-2caa5814-bd32-431c-a9f9-a32208b34fbb","process.thread.name":"shutdown-hook-0","log.logger":"org.apache.spark.util.ShutdownHookManager"}
{"@timestamp":"2024-11-20T11:45:48.606Z","ecs.version":"1.2.0","log.level":"INFO","message":"Deleting directory /tmp/spark-dacdfd95-f166-4b23-9312-af9052730417","process.thread.name":"shutdown-hook-0","log.logger":"org.apache.spark.util.ShutdownHookManager"}

Each JSONL log entry contains the following fields:

Field	Description
`@timestamp`	The time when the log is generated
`ecs.version`	The ECS (Elastic Common Schema) version
`log.level`	The log level, such as INFO or WARN
`message`	The log message
`process.thread.name`	The name of the thread that generates the log
`log.logger`	The name of the logger that records the log

Step 5: Verify log collection

After the Spark job completes, confirm that logs have reached your Logstore before using them for analysis.

Log on to the Simple Log Service console and open your project. In the Logstore, set a time range covering the job execution window and run a query. You should see JSONL-format entries with the ECS fields described above.

If no logs appear, check:

The Logtail configuration is active and the Logstore name matches <SLS_LOGSTORE> in your AliyunLogConfig.
The Spark pods have the label sparkoperator.k8s.io/launched-by-spark-operator: "true" (set automatically by the Spark Operator).
The log file exists at /opt/spark/logs/spark.log inside the driver or executor pod.

Step 6: Query and analyze Spark logs

With logs flowing into Simple Log Service, use the query and analysis features to filter and aggregate Spark job logs by time range, log level, application name, or Spark role (driver vs. executor).

(Optional) Step 7: Clean up

After testing, remove the resources to avoid unnecessary costs.

Delete the Spark job:

kubectl delete -f spark-pi.yaml

Delete the Logtail configuration:

kubectl delete -f aliyun-log-config.yaml

Delete the Log4j2 ConfigMap:

kubectl delete -f spark-log-conf.yaml