All Products
Search
Document Center

AnalyticDB:Spark SQL diagnostics

Last Updated:Mar 28, 2026

Use Spark SQL diagnostics to troubleshoot performance bottlenecks in your Spark SQL queries. For example, you can:

  • Visualize the execution plan tree to identify the slowest operators at a glance.

  • Detect join data bloat — when a join outputs more rows than it takes in.

  • Detect scanned data skew — when data is unevenly distributed across tables, causing long tail effect.

Diagnostics are available only for Spark SQL queries that completed successfully in the last 14 days.

Prerequisites

Before you begin, ensure that you have:

Diagnose a Spark SQL query

  1. Log on to the AnalyticDB for MySQL console. In the upper-left corner, select a region. In the left-side navigation pane, click ClustersData Lakehouse Edition. On the Data Lakehouse Edition tab, find the target cluster and click the cluster ID.

  2. In the left-side navigation pane, choose Diagnostics and Optimization > Spark SQL Diagnostics and Optimization. The query list shows recent Spark SQL queries available for diagnostics. The following table describes the columns.

    ColumnDescription
    SQLThe executed SQL statement.
    Query IDThe ID of the query.
    Execution IDThe sequence ID of the SQL statement in the Spark SQL application.
    StatusThe execution status. Valid values: Completed, Running, Failed.
    Started AtThe time when the query was submitted.
    Execution DurationThe total time to execute the SQL statement.
    Maximum Operator Exclusive TimeThe longest time an operator held resources exclusively during execution.
    Peak MemoryThe peak memory usage of the query.
    Scanned DataThe amount of data returned from the storage layer to the compute layer.
    ActionsClick Diagnose to open the execution details and diagnostic results for the query.
  3. Find the query to diagnose and click Diagnose in the Actions column.

  4. On the Query Properties page, click the Execution Plan tab to view the hierarchy chart of the execution plan tree.

  5. Click Query-level Diagnostic Results to the right of the execution plan tree to view the diagnostic findings and optimization suggestions for the query.

Query-level diagnostic results

AnalyticDB for MySQL analyzes query-level statistics and surfaces optimization suggestions when it detects known performance patterns.

Data bloat in a join

What it means: The join operator outputs more rows than it takes in. This indicates an unreasonable join that consumes excessive computing and memory resources, slowing down the query.

Fix: Filter out rows that are not involved in the join before the join runs, to reduce the amount of data the join operator must process.

Skew in the amount of scanned data

What it means: The amount of data scanned differs significantly across tables during query execution. The long tail effect — where the slowest task determines overall completion time — extends data read time and degrades query performance.

Fix: Select distribution key columns that distribute data more evenly across partitions to reduce per-table scan volume.