Collect Spark driver and executor logs into Simple Log Service (SLS) for centralized querying and troubleshooting of EMR on ACK jobs.
Prerequisites
Before you begin, ensure that you have:
-
A Spark cluster created in the EMR on ACK console. See Get started with EMR on ACK
-
Simple Log Service activated. See Quick start: Use Logtail to collect and analyze ECS text logs
Set up log collection
Step 1: Enable the Logtail component
Enable the Logtail component for Simple Log Service in your ACK cluster. See Collect container logs from ACK clusters.
If Logtail is already enabled, skip this step.
Step 2: Open the SLS project console
-
Log on to the Container Service for Kubernetes console.
-
On the Clusters page, click the name of your cluster, or click Details in the Actions column.
-
On the Basic Information page, go to the Cluster Resources area and click the link in the Log Service Project row.
The SLS project console opens.
Step 3: Create two Logstores
On the Logstores tab, create two Logstores:
-
spark-driver-log— collects Spark driver logs -
spark-executor-log— collects Spark executor logs
For instructions, see Create a project and a Logstore.
Step 4: Configure the spark-driver-log Logstore
-
In the
spark-driver-logLogstore, create a Logtail configuration with Kubernetes - Standard Output as the data source. Select an existing Kubernetes machine group. -
Under Data Collection > Logtail Configurations, select the Kubernetes machine group.
-
Switch to editor mode and enter the following configuration:
Field Value Required Description SPARKLOGENVspark-driverYes Environment variable filter — collects logs only from Spark driver containers StdouttrueYes Collect standard output StderrtrueYes Collect standard error BeginLineCheckLength10Yes Number of characters checked to identify the start of a new log line BeginLineRegex\\d+/\\d+/\\d+.*Yes Regex pattern that marks the beginning of a log entry (matches Spark's timestamp format) typeservice_docker_stdoutYes Input plugin for collecting container stdout/stderr { "inputs": [ { "detail": { "IncludeEnv": { "SPARKLOGENV": "spark-driver" }, "Stderr": true, "Stdout": true, "BeginLineCheckLength": 10, "BeginLineRegex": "\\d+/\\d+/\\d+.*" }, "type": "service_docker_stdout" } ] }Key fields:
Step 5: Configure the spark-executor-log Logstore
Repeat Step 4 for the spark-executor-log Logstore, but change SPARKLOGENV to spark-executor:
{
"inputs": [
{
"detail": {
"IncludeEnv": {
"SPARKLOGENV": "spark-executor"
},
"Stderr": true,
"Stdout": true,
"BeginLineCheckLength": 10,
"BeginLineRegex": "\\d+/\\d+/\\d+.*"
},
"type": "service_docker_stdout"
}
]
}
All other fields are identical to the driver configuration.
Step 6: Enable indexing
Enable indexing for both Logstores so you can run queries. See Create indexes.
Query Spark logs
After completing the setup, open the SLS project console and navigate to the Logstore for the log type you want to inspect:
| Log type | Logstore |
|---|---|
| Driver logs | spark-driver-log |
| Executor logs | spark-executor-log |
Use the search bar in each Logstore to filter logs by keywords, time range, or field values.