Due to factors such as skewed data distribution and complex queries, the query performance may not meet your expectations. One way to identify potential performance issues is to check execution plans.

Common issues that affect the query performance

  • Join method and inner and outer tables

    Typically, AnalyticDB for MySQL determines the left and right tables to access for inner and outer joins based on the join method.

    In a hash join, the right table is used to build the hash table, whereas the left table probes the hash table for a hash match. Typically, the right table must be smaller than the left table to avoid additional overhead and reduce the size of the hash table. You can check the number of data records after the tables are joined and filtered to determine whether the left and right tables are selected in a proper way. However, other factors such as intermediate join results and the join type complicate the selection of the left and right tables.

  • Join Order

    One of the key challenges to query optimizers is to optimize the join order. This is also a classic NP-hard problem. The query optimizer of AnalyticDB for MySQL provides two join reordering policies for different loads: Left Deep Tree (LDT) and Bushy Tree (BT). The LDT policy is suitable for simple queries, and the BT policy is suitable for complex queries. The BT policy allows the tables that enable efficient filtering to be joined first to generate an optimal join order for complex queries.

    The optimal join order for a complex query is difficult for engineers to determine. If a large number of tables are joined in a data-intensive query, the query requires senior experts to perform query optimization. We recommend that you prioritize the tables that enable efficient filtering in a join order. This provides a quick method for you to filter out undesired data at the earliest opportunity.

  • Distributed aggregation

    AnalyticDB for MySQL provides distributed aggregation capabilities that allow aggregation operations to be performed in several stages based on the data volume. In most cases, the query optimizer of AnalyticDB for MySQL is able to select the optimal aggregation plan. However, if the data distribution is skewed, the query optimizer makes inaccurate estimates, which affects the aggregation performance. For example, AnalyticDB for MySQL performs an aggregation in two stages. In the first stage, AnalyticDB for MySQL performs a partial aggregation on each compute node. This reduces the volume of data to be aggregated. In the second stage, AnalyticDB for MySQL performs a final aggregation based on a reshuffle of the aggregated columns. If the data distribution is skewed or some columns to be aggregated are unique, the partial aggregation cannot reduce the volume of data to be aggregated but causes additional performance overhead.

  • Other issues

    This topic lists some common issues related to execution plans. For example, a large table that has no filter condition is found during a full table scan, or the filter conditions are not pushed down to the storage engine. These complex issues must be troubleshooted by the expert service team of AnalyticDB for MySQL.

Improve an execution plan

  • Collect table statistics

    The query optimizer of AnalyticDB for MySQL estimates the overhead of an execution plan based on table statistics. Therefore, accurate statistics are crucial to generate execution plans. If the generated execution plan is not optimal, the most effective troubleshooting method is to update table statistics to ensure that the query optimizer obtains up-to-date and accurate statistics.

    If you have not executed the ANALYZE statement to collect table statistics, we recommend that you execute the ANALYZE TABLE table_name; statement to obtain the statistics of a single table. To collect statistics of all tables in a database, you can execute the USE db_name;ANALYZE TABLE ALL; statement. After the statistics are collected, the query optimizer can generate optimal execution plans for most queries. However, in scenarios such as skewed data distributions, the query optimizer may not be able to generate the optimal execution plan. In this case, some special optimization methods are required. One of the most widely used methods is to adjust the join order.

  • Adjust the join order

    AnalyticDB for MySQL can select the optimal join order except in some scenarios where data is unevenly distributed or complex queries are involved. To improve the query performance, you can introduce the reorder_joins parameter as a join hint to specify whether to manually adjust the join order.

    • Introduce /* + reorder_joins=false */; as a join hint to enable manual adjustment of the join order. Then, you can change the order of the table names in the FROM clause to generate a join order for the query.
    • Introduce /* + reorder_joins=true */; as a join hint to disable manual adjustment of the join order. The query optimizer of AnalyticDB for MySQL can automatically generate the optimal join order in most cases.
    For example, for tables named nation, region, and customer, the join order generated by AnalyticDB for MySQL is to join the region table, the nation table, and the customer table in sequence. If you find a more effective join order based on the actual query performance, you can introduce a join hint in the request header to change the join order. An example of a more efficient alternative is to join the customer table, the nation table, and the region table in sequence.
    /*+ reorder_joins=false */
           EXPLAIN SELECT count(*)
           FROM    customer, nation, region
           WHERE c_nationkey = n_nationkey
           AND n_regionkey = r_regionkey
           AND r_name = 'ASIA';                       
           | Plan Summary  |
           +---------------+
           1- Output[ Query plan ]
           2  -> Aggregate (FINAL)
           3    -> LocalExchange[SINGLE]
           4      -> Exchange[GATHER]
           5        -> Aggregate (PARTIAL)
           6          -> InnerJoin[Hash Join]
           7            - Project
           8              -> InnerJoin[Hash Join]
           9                - ScanProject {table: customer}
           10                  -> TableScan {table: customer}
           11                -> LocalExchange[HASH]
           12                  -> Exchange[REPLICATE]
           13                    - ScanProject {table: nation}
           14                      -> TableScan {table: nation}
           15            -> LocalExchange[HASH]
           16              -> Exchange[REPLICATE]
           17                - ScanProject {table: region}
           18                  -> TableScan {table: region}                    
  • Plan Hint For Join Order
    AnalyticDB for MySQL also supports the Plan Hint For Join Order feature to help you change the join order. You can include leading_join_order in the request header to change the join order.
    Note To help you identify a table that is referenced multiple times in a query, you must specify an alias for the table when you use the Plan Hint For Join Order feature.

    After you enable the Plan Hint For Join Order feature for the SQL statement shown in the preceding example, the same join order is generated.

     /*+ leading_join_order=((c n) r) */
          EXPLAIN SELECT count(*)
          FROM    nation n , region r, customer c
          WHERE c_nationkey = n_nationkey
          AND n_regionkey = r_regionkey
          AND r_name = 'ASIA';                    

Execution plan optimization is a broad field. This topic discusses some basic optimization methods for AnalyticDB for MySQL. More optimization best practices are supplemented in a continuous way.