Operator-level diagnosis results - AnalyticDB - Alibaba Cloud Documentation Center

AnalyticDB for MySQL provides the SQL diagnostics feature to separately collect statistics for SQL query information at the query, stage, and operator levels, use the statistics to diagnose issues, and then provide optimization suggestions. This topic describes how to view and analyze operator-level diagnosis results.

Diagnosis result types

Note

For more information about how to view operator-level diagnostic results, see View diagnostic results.

The aggregation rate of an aggregation operator is low
Filter conditions are not pushed down
Data expansion occurs in a join
The right table in a join is large in size
Cross joins exist
Scan operators read a large number of columns
Data skew occurs in the scanned data amount
Indexes are inefficient

The aggregation rate of an aggregation operator is low

Problem
The aggregation rate of an aggregation operator refers to the ratio of the input data size to the output data size in after data is grouped based on the GROUP BY column and aggregated in each group. A lower ratio indicates a lower aggregation rate and worse aggregation effects. In AnalyticDB for MySQL, a GROUP BY operation consists of two steps: partial aggregation and final aggregation. A large number of aggregation operator groups can cause a low aggregation rate. In the partial aggregation step, the amount of data to be transferred over networks cannot be reduced but a large amount of computing resources are consumed.
Suggestion
You can choose to skip the partial aggregation step, redistribute data among each node, and then perform final aggregation. For more information, see Grouping and aggregation query optimization.

Filter conditions are not pushed down

Problem
By default, AnalyticDB for MySQL creates indexes for all columns in the table during data storage. You can use these indexes to accelerate data filtering when you query data. AnalyticDB for MySQL does not push down filter conditions in the following scenarios:
- If the no_index_columns or filter_not_pushdown_columns hint is used in query statements, or the adb_config filter_not_pushdown_columns configuration is used in clusters, the filter condition pushdown feature is disabled.
- Functions such as CAST are used in filter conditions.
- Related columns in filter conditions do not have indexes. For example, the no_index keyword is used when you create a table, or the no_index statement is executed to delete indexes after a table is created.
Suggestion
- If the filter condition pushdown feature is disabled because the hint is used in a query statement or the cluster uses the configuration, check why the hint or configuration is used and determine whether the hint or configuration can be canceled. For more information, see Filter conditions without pushdown.
- If you use a function, you can choose whether to directly use the function to write data and remove the function during query.
- If a filter condition is not pushed down because related columns in the filter condition do not have indexes, you must check why the columns do not have indexes.

Data expansion occurs in a join

Problem
The data expansion rate of a join is the ratio of the number of output rows to the number of input rows. The number of input rows is the sum of the number of rows in the left table and the number of rows in the right table. For an appropriate join condition, the number of output rows is smaller than that of input rows. If the number of output rows is greater than that of input rows, data expansion occurs. This causes a large amount of computing and memory resources to be occupied. Therefore, queries become slow.
Suggestion
- If data expansion in the join is caused by data characteristics such as large numbers of duplicate values in both the left and right tables, you can filter out all duplicate values from the join.
- If data expansion is caused by an inappropriate join order, you can manually adjust the join order. For more information, see Manually adjust join orders.

The right table in a join is large in size

Problem
In AnalyticDB for MySQL, the right table in a join refers to the builder table that is used to build a hash or set structure in the memory. The right table that is large in size may occupy a large amount of memory resources and affect the overall stability of clusters. The right table in a join may be large in size due to the following reasons:
- An SQL statement contains the LEFT JOIN clause. The right table in a left join must be used as the builder table during execution. If the right table in the left join is large in size, a large amount of memory resources are consumed.
- When AnalyticDB for MySQL estimates data sizes of the left and right tables, estimation is inaccurate due to reasons such as statistics expiration.
Suggestion
We recommend that you rewrite the left join to the right join. For more information, see Rewrite left join to right join.

Cross joins exist

Problem
A cross join is a JOIN operation without join conditions that returns the Cartesian product of rows from the left and right tables in the join. If both the left and right tables are large in size, the stability of AnalyticDB for MySQL clusters is greatly affected.
Suggestion
You can choose to add join conditions to eliminate cross joins.

Scan operators read a large number of columns

Problem
Scan operators filter data and read detailed data at the storage layer of AnalyticDB for MySQL. If the SELECT statement contains a large number of columns and a large amount of detailed data is read, a large amount of disk I/O resources are occupied and the overall stability of AnalyticDB for MySQL clusters is affected.
Suggestion
You can optimize your SQL statement to reduce unnecessary columns in the SELECT statement.

Data skew occurs in the scanned data amount

Problem
AnalyticDB for MySQL is a distributed execution architecture. Typically, you must specify distribution columns for data in large tables. During data write, data is distributed to different storage nodes based on the distribution columns. If the values of the distribution columns are unevenly distributed, data is unevenly stored on each node. When data is read, each node has a long tail of time, which affects the final query effect.
Suggestion
You can select appropriate distribution columns to mitigate data skew in the scanned data amount. For more information, see Diagnostics on distribution field skew.

Indexes are inefficient

Problem
If AnalyticDB for MySQL uses indexes to filter data while the input to output data size ratio of a filter operator is low, data may not be filtered by indexes as expected.
Suggestion
You can choose to use filter operations on compute nodes instead of pushing down filter conditions. For more information, see Filter conditions without pushdown.