The SQL diagnostics feature in AnalyticDB for MySQL collects statistics at the query, stage, and operator levels. This page describes the eight operator-level diagnosis results, what causes each issue, and how to fix it.
To view operator-level diagnosis results in the console, see View diagnostic results.
Diagnosis types at a glance
| Diagnosis result | Key signal | Primary fix |
|---|---|---|
| Low aggregation rate | High input-to-output ratio in GROUP BY | Skip partial aggregation; redistribute data |
| Filter conditions not pushed down | Indexes exist but filter runs outside storage layer | Remove blocking hint or function; rebuild index |
| Data expansion in a join | Output row count exceeds total input row count | Remove duplicates or adjust join order |
| Large right table in a join | Right (builder) table is large; high memory usage | Rewrite left join to right join |
| Cross joins exist | JOIN with no join condition; Cartesian product returned | Add join conditions |
| Scan operators read many columns | High column count in SELECT; large disk I/O | Remove unnecessary columns from SELECT |
| Data skew in scanned data amount | Uneven row counts across storage nodes | Choose better distribution columns |
| Inefficient indexes | Low input-to-output ratio in filter operator despite index use | Move filtering to compute nodes |
The aggregation rate of an aggregation operator is low
What it means
The aggregation rate is the ratio of input data size to output data size after rows are grouped by GROUP BY columns and aggregated within each group. A low ratio means few rows are being collapsed per group, so the aggregation is not reducing data volume effectively.
In AnalyticDB for MySQL, a GROUP BY operation runs in two steps: partial aggregation on each node, then final aggregation across nodes. When there are many distinct groups, partial aggregation cannot reduce the data transferred over the network — but it still consumes computing resources on every node.
How to confirm
In the operator details, compare the input row count and output row count for the aggregation operator. If output rows are close to input rows, the aggregation rate is low.
Fix
Skip the partial aggregation step, redistribute data across nodes by GROUP BY key, then perform final aggregation. For details, see Grouping and aggregation query optimization.
Filter conditions are not pushed down
What it means
By default, AnalyticDB for MySQL indexes all columns at storage time, allowing filter conditions to be evaluated at the storage layer before data is read. When filter pushdown is disabled, filtering runs later in the execution pipeline, causing more data to be read and transferred.
Filter pushdown is disabled in three situations:
Hint or cluster configuration disables it — the
no_index_columnsorfilter_not_pushdown_columnshint is used in the query, or theadb_config filter_not_pushdown_columnscluster configuration is set.A function wraps the filter column — functions such as
CASTprevent index evaluation at the storage layer.The column has no index — the
no_indexkeyword was used when creating the table, or theno_indexstatement deleted the index after table creation.
How to confirm
In the operator details, check whether the filter operator runs above the scan operator (post-scan filtering) rather than inside it. The scan operator's output row count will be high relative to what you'd expect after filtering.
Fix
If a hint or cluster configuration is blocking pushdown, check why it was added and remove it if it is no longer needed. For details, see Filter conditions without pushdown.
If a function such as
CASTis wrapping the column, consider writing the data in the converted form and removing the function from the query.If the column has no index, investigate why the index is missing and restore it if appropriate.
Data expansion occurs in a join
What it means
The data expansion rate of a join is the ratio of output rows to input rows, where input rows equal the sum of rows in the left and right tables. For a well-designed join, output rows are fewer than input rows. When output rows exceed input rows, data expansion occurs — the join is multiplying rows rather than filtering them — which consumes large amounts of memory and computing resources and slows the query.
Data expansion has two distinct root causes:
Duplicate values — both tables have many duplicate values on the join key, causing one-to-many or many-to-many matches.
Incorrect join order — the join order chosen by the optimizer amplifies row counts at an early stage, inflating all subsequent operations.
How to confirm
In the operator details, compare the output row count of the join operator against the sum of the input row counts from the left and right tables. An output count larger than the combined input count confirms data expansion.
Fix
If duplicate values are the cause, filter out duplicates before the join using a subquery or deduplication step.
If join order is the cause, manually adjust the join order. For details, see Manually adjust join orders.
The right table in a join is large in size
What it means
In AnalyticDB for MySQL, the right table in a join is the builder table — the table used to build a hash or set structure in memory before the join runs. A large builder table consumes significant memory and can affect overall cluster stability.
A large right table has one of two causes:
LEFT JOIN forces the right table to be the builder — the SQL contains a
LEFT JOINclause. Because the right table in a left join must always be the builder table, a large right table cannot be swapped to the left side automatically.Stale statistics cause a bad size estimate — AnalyticDB for MySQL uses statistics to estimate table sizes when determining join order. If statistics have expired, the optimizer may incorrectly assign the larger table to the right (builder) side.
How to confirm
In the operator details, check the input row count for the right (build) side of the join operator. A high row count on the build side, combined with high memory usage in the operator, confirms this issue.
Fix
Rewrite the LEFT JOIN as a right join so the smaller table becomes the builder table. For details, see Rewrite left join to right join.
We recommend that you rewrite the left join to the right join. For more information, see Rewrite left join to right join.
If estimation errors occur due to statistics expiration, you can Submit a ticket to contact technical support.
Cross joins exist
What it means
A cross join is a JOIN operation without join conditions. It returns the Cartesian product of the left and right tables — every row in the left table is paired with every row in the right table. If both tables are large, the result set grows to left_rows × right_rows, which can severely affect cluster stability.
How to confirm
In the query plan, look for a join operator with no join condition predicate. In the operator details, the output row count will be approximately equal to the product of the two input row counts.
Fix
Add join conditions to eliminate the Cartesian product.
Scan operators read a large number of columns
What it means
Scan operators filter data and read detailed rows from the storage layer of AnalyticDB for MySQL. When the SELECT statement lists many columns, the scan operator must read a large volume of data from disk. This drives up disk I/O consumption and can affect cluster stability.
How to confirm
In the operator details, check the column count read by the scan operator. A high column count combined with a large scanned data volume confirms this issue.
Fix
Rewrite the SQL statement to remove columns not needed by the query. Replace SELECT * with an explicit column list.
Data skew occurs in the scanned data amount
What it means
AnalyticDB for MySQL uses a distributed execution architecture. For large tables, you specify distribution columns that control how rows are spread across storage nodes at write time. If the distribution column values are unevenly distributed — for example, if one value accounts for most of the rows — data concentrates on a small number of nodes. During reads, those nodes become bottlenecks with long tail latency, which slows the overall query.
How to confirm
In the operator details, compare the scanned data amounts across storage nodes for the scan operator. A large difference between the node with the most data and the node with the least data confirms data skew.
Fix
Choose distribution columns with high cardinality and even value distribution. For guidance on diagnosing and resolving distribution column skew, see Diagnostics on distribution field skew.
Indexes are inefficient
What it means
AnalyticDB for MySQL uses indexes to filter data at the storage layer. When a filter operator has a low input-to-output ratio — meaning most rows survive the filter despite index use — the index is not selectively narrowing down the data. This suggests the index is not being applied as expected, and filtering is not reducing the data volume before it reaches the compute layer.
How to confirm
In the operator details, check the input row count and output row count for the filter operator. If the output count is close to the input count even though an index is in use, the index is ineffective for this filter.
Fix
Move the filtering to compute nodes instead of pushing the filter condition to the storage layer. For details, see Filter conditions without pushdown.