SQL performance diagnosis and optimization FAQ - AnalyticDB

The SQL diagnostics feature in AnalyticDB for MySQL collects statistics at the query, stage, and operator levels. This page describes the eight operator-level diagnosis results, what causes each issue, and how to fix it.

Note

To view operator-level diagnosis results in the console, see View diagnostic results.

Diagnosis types at a glance

Diagnosis result	Key signal	Primary fix
Low aggregation rate	High input-to-output ratio in GROUP BY	Skip partial aggregation; redistribute data
Filter conditions not pushed down	Indexes exist but filter runs outside storage layer	Remove blocking hint or function; rebuild index
Data expansion in a join	Output row count exceeds total input row count	Remove duplicates or adjust join order
Large right table in a join	Right (builder) table is large; high memory usage	Rewrite left join to right join
Cross joins exist	JOIN with no join condition; Cartesian product returned	Add join conditions
Scan operators read many columns	High column count in SELECT; large disk I/O	Remove unnecessary columns from SELECT
Data skew in scanned data amount	Uneven row counts across storage nodes	Choose better distribution columns
Inefficient indexes	Low input-to-output ratio in filter operator despite index use	Move filtering to compute nodes

The aggregation rate of an aggregation operator is low

What it means

The aggregation rate is the ratio of input data size to output data size after rows are grouped by GROUP BY columns and aggregated within each group. A low ratio means few rows are being collapsed per group, so the aggregation is not reducing data volume effectively.

In AnalyticDB for MySQL, a GROUP BY operation runs in two steps: partial aggregation on each node, then final aggregation across nodes. When there are many distinct groups, partial aggregation cannot reduce the data transferred over the network — but it still consumes computing resources on every node.

How to confirm

In the operator details, compare the input row count and output row count for the aggregation operator. If output rows are close to input rows, the aggregation rate is low.

Fix

Skip the partial aggregation step, redistribute data across nodes by GROUP BY key, then perform final aggregation. For details, see Grouping and aggregation query optimization.

Filter conditions are not pushed down

What it means

By default, AnalyticDB for MySQL indexes all columns at storage time, allowing filter conditions to be evaluated at the storage layer before data is read. When filter pushdown is disabled, filtering runs later in the execution pipeline, causing more data to be read and transferred.

Filter pushdown is disabled in three situations:

Hint or cluster configuration disables it — the no_index_columns or filter_not_pushdown_columns hint is used in the query, or the adb_config filter_not_pushdown_columns cluster configuration is set.
A function wraps the filter column — functions such as CAST prevent index evaluation at the storage layer.
The column has no index — the no_index keyword was used when creating the table, or the no_index statement deleted the index after table creation.

How to confirm

In the operator details, check whether the filter operator runs above the scan operator (post-scan filtering) rather than inside it. The scan operator's output row count will be high relative to what you'd expect after filtering.

Fix

If a hint or cluster configuration is blocking pushdown, check why it was added and remove it if it is no longer needed. For details, see Filter conditions without pushdown.
If a function such as CAST is wrapping the column, consider writing the data in the converted form and removing the function from the query.
If the column has no index, investigate why the index is missing and restore it if appropriate.

Data expansion occurs in a join

What it means

The data expansion rate of a join is the ratio of output rows to input rows, where input rows equal the sum of rows in the left and right tables. For a well-designed join, output rows are fewer than input rows. When output rows exceed input rows, data expansion occurs — the join is multiplying rows rather than filtering them — which consumes large amounts of memory and computing resources and slows the query.

Data expansion has two distinct root causes:

Duplicate values — both tables have many duplicate values on the join key, causing one-to-many or many-to-many matches.
Incorrect join order — the join order chosen by the optimizer amplifies row counts at an early stage, inflating all subsequent operations.

How to confirm

In the operator details, compare the output row count of the join operator against the sum of the input row counts from the left and right tables. An output count larger than the combined input count confirms data expansion.

Fix

If duplicate values are the cause, filter out duplicates before the join using a subquery or deduplication step.
If join order is the cause, manually adjust the join order. For details, see Manually adjust join orders.

The right table in a join is large in size

What it means

In AnalyticDB for MySQL, the right table in a join is the builder table — the table used to build a hash or set structure in memory before the join runs. A large builder table consumes significant memory and can affect overall cluster stability.

A large right table has one of two causes:

LEFT JOIN forces the right table to be the builder — the SQL contains a LEFT JOIN clause. Because the right table in a left join must always be the builder table, a large right table cannot be swapped to the left side automatically.
Stale statistics cause a bad size estimate — AnalyticDB for MySQL uses statistics to estimate table sizes when determining join order. If statistics have expired, the optimizer may incorrectly assign the larger table to the right (builder) side.

How to confirm

In the operator details, check the input row count for the right (build) side of the join operator. A high row count on the build side, combined with high memory usage in the operator, confirms this issue.

Fix

Rewrite the LEFT JOIN as a right join so the smaller table becomes the builder table. For details, see Rewrite left join to right join.

We recommend that you rewrite the left join to the right join. For more information, see Rewrite left join to right join.
If estimation errors occur due to statistics expiration, you can Submit a ticket to contact technical support.

Cross joins exist

What it means

A cross join is a JOIN operation without join conditions. It returns the Cartesian product of the left and right tables — every row in the left table is paired with every row in the right table. If both tables are large, the result set grows to left_rows × right_rows, which can severely affect cluster stability.

How to confirm

In the query plan, look for a join operator with no join condition predicate. In the operator details, the output row count will be approximately equal to the product of the two input row counts.

Fix

Add join conditions to eliminate the Cartesian product.

Scan operators read a large number of columns

What it means

Scan operators filter data and read detailed rows from the storage layer of AnalyticDB for MySQL. When the SELECT statement lists many columns, the scan operator must read a large volume of data from disk. This drives up disk I/O consumption and can affect cluster stability.

How to confirm

In the operator details, check the column count read by the scan operator. A high column count combined with a large scanned data volume confirms this issue.

Fix

Rewrite the SQL statement to remove columns not needed by the query. Replace SELECT * with an explicit column list.

Data skew occurs in the scanned data amount

What it means

AnalyticDB for MySQL uses a distributed execution architecture. For large tables, you specify distribution columns that control how rows are spread across storage nodes at write time. If the distribution column values are unevenly distributed — for example, if one value accounts for most of the rows — data concentrates on a small number of nodes. During reads, those nodes become bottlenecks with long tail latency, which slows the overall query.

How to confirm

In the operator details, compare the scanned data amounts across storage nodes for the scan operator. A large difference between the node with the most data and the node with the least data confirms data skew.

Fix

Choose distribution columns with high cardinality and even value distribution. For guidance on diagnosing and resolving distribution column skew, see Diagnostics on distribution field skew.

Indexes are inefficient

What it means

AnalyticDB for MySQL uses indexes to filter data at the storage layer. When a filter operator has a low input-to-output ratio — meaning most rows survive the filter despite index use — the index is not selectively narrowing down the data. This suggests the index is not being applied as expected, and filtering is not reducing the data volume before it reaches the compute layer.

How to confirm

In the operator details, check the input row count and output row count for the filter operator. If the output count is close to the input count even though an index is in use, the index is ineffective for this filter.

Fix

Move the filtering to compute nodes instead of pushing the filter condition to the storage layer. For details, see Filter conditions without pushdown.