Use Spark SQL diagnostics to troubleshoot performance bottlenecks in your Spark SQL queries. For example, you can:
Visualize the execution plan tree to identify the slowest operators at a glance.
Detect join data bloat — when a join outputs more rows than it takes in.
Detect scanned data skew — when data is unevenly distributed across tables, causing long tail effect.
Diagnostics are available only for Spark SQL queries that completed successfully in the last 14 days.
Prerequisites
Before you begin, ensure that you have:
An AnalyticDB for MySQL Data Lakehouse Edition clusterData Lakehouse Edition
A job resource group created for the cluster. For more information, see Create a resource group
A database account created for the cluster:
Alibaba Cloud account: create a privileged account. For more information, see the "Create a privileged account" section in Create a database account
Resource Access Management (RAM) user: create both a privileged account and a standard account, then associate the standard account with the RAM user. For more information, see Create a database account and Associate or disassociate a database account with or from a RAM user
AnalyticDB for MySQL authorized to assume the AliyunADBSparkProcessingDataRole role. For more information, see Perform authorization
Diagnose a Spark SQL query
Log on to the AnalyticDB for MySQL console. In the upper-left corner, select a region. In the left-side navigation pane, click ClustersData Lakehouse Edition. On the Data Lakehouse Edition tab, find the target cluster and click the cluster ID.
In the left-side navigation pane, choose Diagnostics and Optimization > Spark SQL Diagnostics and Optimization. The query list shows recent Spark SQL queries available for diagnostics. The following table describes the columns.
Column Description SQL The executed SQL statement. Query ID The ID of the query. Execution ID The sequence ID of the SQL statement in the Spark SQL application. Status The execution status. Valid values: Completed, Running, Failed. Started At The time when the query was submitted. Execution Duration The total time to execute the SQL statement. Maximum Operator Exclusive Time The longest time an operator held resources exclusively during execution. Peak Memory The peak memory usage of the query. Scanned Data The amount of data returned from the storage layer to the compute layer. Actions Click Diagnose to open the execution details and diagnostic results for the query. Find the query to diagnose and click Diagnose in the Actions column.
On the Query Properties page, click the Execution Plan tab to view the hierarchy chart of the execution plan tree.
Click Query-level Diagnostic Results to the right of the execution plan tree to view the diagnostic findings and optimization suggestions for the query.
Query-level diagnostic results
AnalyticDB for MySQL analyzes query-level statistics and surfaces optimization suggestions when it detects known performance patterns.
Data bloat in a join
What it means: The join operator outputs more rows than it takes in. This indicates an unreasonable join that consumes excessive computing and memory resources, slowing down the query.
Fix: Filter out rows that are not involved in the join before the join runs, to reduce the amount of data the join operator must process.
Skew in the amount of scanned data
What it means: The amount of data scanned differs significantly across tables during query execution. The long tail effect — where the slowest task determines overall completion time — extends data read time and degrades query performance.
Fix: Select distribution key columns that distribute data more evenly across partitions to reduce per-table scan volume.