AnalyticDB for MySQL provides the SQL diagnostics feature to separately collect statistics for SQL query information at the query, stage, and operator levels, use the statistics to diagnose issues, and then provide optimization suggestions. This topic describes how to view and analyze operator-level diagnosis results.

Diagnosis result types

Note For more information about how to view operator-level diagnostic results, see View diagnosis results.

The aggregation rate of an aggregation operator is low

  • Problem

    The aggregation rate of an aggregation operator refers to the ratio of the input data size to the output data size in after data is grouped based on the GROUP BY column and aggreated in each group. A lower ratio indicates a lower aggregation rate and worse aggregation effects. In AnalyticDB for MySQL, a GROUP BY operation consists of two steps: partial aggregation and final aggregation. A large number of aggregation operator groups can cause a low aggregation rate. In the partial aggregation step, the amount of data to be transferred over networks cannot be reduced but a large amount of computing resources are consumed.

  • Suggestion

    You can choose to skip the partial aggregation step, redistribute data among each node, and then perform final aggregation. For more information, see Grouping and aggregation query optimization.

Filter conditions are not pushed down

  • Problem
    By default, AnalyticDB for MySQL creates indexes for all columns in the table during data storage. You can use these indexes to accelerate data filtering when you query data. AnalyticDB for MySQL does not push down filter conditions in the following scenarios:
    • If the no_index_columns or filter_not_pushdown_columns hint is used in query statements, or the adb_config filter_not_pushdown_columns configuration is used in clusters, the filter condition pushdown feature is disabled.
    • Functions such as CAST are used in filter conditions.
    • Related columns in filter conditions do not have indexes. For example, the no_index keyword is used when you create a table, or the no_index statement is executed to delete indexes after a table is created.
  • Suggestion
    • If the filter condition pushdown feature is disabled because the hint is used in a query statement or the cluster uses the configuration, check why the hint or configuration is used and determine whether the hint or configuration can be canceled. For more information, see Filter conditions without pushdown.
    • If you use a function, you can choose whether to directly use the function to write data and remove the function during query.
    • If a filter condition is not pushed down because related columns in the filter condition do not have indexes, you must check why the columns do not have indexes.

Data expansion occurs in a join

  • Problem

    The data expansion rate of a join is the ratio of the number of output rows to the number of input rows. The number of input rows is the sum of the number of rows in the left table and the number of rows in the right table. For an appropriate join condition, the number of output rows is smaller than that of input rows. If the number of output rows is greater than that of input rows, data expansion occurs. This causes a large amount of computing and memory resources to be occupied. Therefore, queries become slow.

  • Suggestion
    • If data expansion in the join is caused by data characteristics such as large numbers of duplicate values in both the left and right tables, you can filter out all duplicate values from the join.
    • If data expansion is caused by an inappropriate join order, you can manually adjust the join order. For more information, see Manually adjust join orders.

The right table in a join is large in size

  • Problem
    In AnalyticDB for MySQL, the right table in a join refers to the builder table that is used to build a hash or set structure in the memory. The right table that is large in size may occupy a large amount of memory resources and affect the overall stability of clusters. The right table in a join may be large in size due to the following reasons:
    • An SQL statement contains the LEFT JOIN clause. The right table in a left join must be used as the builder table during execution. If the right table in the left join is large in size, a large amount of memory resources are consumed.
    • When AnalyticDB for MySQL estimates data sizes of the left and right tables, estimation is inaccurate due to reasons such as statistics expiration.
  • Suggestion
    • We recommend that you rewrite the left join to the right join. For more information, see Rewrite left join to right join.
    • If estimation errors occur due to statistics expiration, you can submit a ticket to contact technical support.

Cross joins exist

  • Problem

    A cross join is a JOIN operation without join conditions that returns the Cartesian product of rows from the left and right tables in the join. If both the left and right tables are large in size, the stability of AnalyticDB for MySQL clusters is greatly affected.

  • Suggestion

    You can choose to add join conditions to eliminate cross joins.

Scan operators read a large number of columns

  • Problem

    Scan operators filter data and read detailed data at the storage layer of AnalyticDB for MySQL. If the SELECT statement contains a large number of columns and a large amount of detailed data is read, a large amount of disk I/O resources are occupied and the overall stability of AnalyticDB for MySQL clusters is affected.

  • Suggestion

    You can optimize your SQL statement to reduce unnecessary columns in the SELECT statement.

Data skew occurs in the scanned data amount

  • Problem

    AnalyticDB for MySQL is a distributed execution architecture. Typically, you must specify distribution columns for data in large tables. During data write, data is distributed to different storage nodes based on the distribution columns. If the values of the distribution columns are unevenly distributed, data is unevenly stored on each node. When data is read, each node has a long tail of time, which affects the final query effect.

  • Suggestion

    You can select appropriate distribution columns to mitigate data skew in the scanned data amount. For more information, see Distribution field skew diagnosis.

Indexes are inefficient

  • Problem

    If AnalyticDB for MySQL uses indexes to filter data while the input to output data size ratio of a filter operator is low, data may not be filtered by indexes as expected.

  • Suggestion

    You can choose to use filter operations on compute nodes instead of pushing down filter conditions. For more information, see Filter conditions without pushdown.