Due to factors such as skewed data distribution and complex queries, query performance may fail to meet your expectations. One way to identify potential performance issues is to check execution plans.
Common issues that affect query performance
- Join method and inner and outer tables
Typically, AnalyticDB for MySQL determines the left and right tables to access for inner and outer joins based on the join method.
In a hash join, the right table is used to build a hash table, whereas the left table probes the hash table for a hash match. Typically, the right table must be smaller than the left table in size to prevent additional overhead and reduce the size of the hash table. You can check the number of data entries after the tables are joined and filtered to determine whether the left and right tables are selected in a proper way. However, other factors such as intermediate join results and the join type complicate the selection of the left and right tables.
- Join Order
One of the key challenges to query optimizers is to optimize the join order. This is also a classic NP-hard problem. The query optimizer of AnalyticDB for MySQL provides two join reordering policies for different loads: Left Deep Tree (LDT) and Bushy Tree (BT). The LDT policy is suitable for simple queries, and the BT policy is suitable for complex queries. The BT policy allows the tables that enable efficient filtering to be joined first to generate an optimal join order for complex queries.
The optimal join order for a complex query is difficult for engineers to determine. If a large number of tables are joined in a data-intensive query, the query requires senior experts to perform query optimization. We recommend that you prioritize the tables that enable efficient filtering in a join order. This provides a quick method for you to filter out undesired data at the earliest opportunity.
- Distributed aggregation
AnalyticDB for MySQL provides distributed aggregation capabilities that allow aggregation operations to be performed in several stages based on the data volume. In most cases, the query optimizer of AnalyticDB for MySQL is able to select the optimal aggregation plan. However, if the data distribution is skewed, the query optimizer makes inaccurate estimates. This affects the aggregation performance. For example, AnalyticDB for MySQL performs an aggregation in two stages. In the first stage, AnalyticDB for MySQL performs a partial aggregation on each compute node. This reduces the volume of data to be aggregated. In the second stage, AnalyticDB for MySQL performs a final aggregation based on a reshuffle of the aggregated columns. If the data distribution is skewed or some columns to be aggregated are unique, the partial aggregation cannot reduce the volume of data to be aggregated but causes additional performance overhead.
- Other issues
This topic lists only common issues related to execution plans. Complex issues must be troubleshooted by the expert service team of AnalyticDB for MySQL. For example, a large table that has no filter condition is found during a full table scan or filter conditions are not pushed down to the storage engine.
Improve execution plans
- Collect table statistics
The query optimizer of AnalyticDB for MySQL estimates the overhead of execution plans based on table statistics and selects an optimal execution plan. Statistics are automatically collected without the need of manual operations. For more information, see the "Automatic statistics collection" section of the Statistics topic. If the system has not collected statistics of a new table, you can execute
SELECT * FROM information_schema.column_statistics;
to check whether the information_schema.column_statistics table contains statistics of the new table. This system table is available only for clusters of V3.1.6 and later. If the system table does not contain statistics of the new table, you can execute a statement to manually collect statistics before automatic collection. For more information about manually collecting statistics, see the "Manually collect statistics" section of the Statistics topic. - Change the join order
AnalyticDB for MySQL can select the optimal join order except in some scenarios where data is unevenly distributed or complex queries are involved. To improve the query performance, you can introduce the
reorder_joins
parameter as a hint to specify whether to manually change the join order.- Add the
/* + reorder_joins=false */;
hint to enable manual change of the join order. Then, you can change the order of the table names in the FROM clause to generate a join order for the query. - Add the
/* + reorder_joins=true */;
hint to disable manual change of the join order. The query optimizer of AnalyticDB for MySQL can automatically generate the optimal join order in most cases.
For example, for tables named nation, region, and customer, the join order generated by AnalyticDB for MySQL is region > nation > customer. If you find the join order of customer > nation > region is more effective based on the actual query performance, you can add a hint to the beginning of the query to change the join order./*+ reorder_joins=false */ EXPLAIN SELECT count(*) FROM customer, nation, region WHERE c_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA';
| Plan Summary | +---------------+ 1- Output[ Query plan ] 2 -> Aggregate (FINAL) 3 -> LocalExchange[SINGLE] 4 -> Exchange[GATHER] 5 -> Aggregate (PARTIAL) 6 -> InnerJoin[Hash Join] 7 - Project 8 -> InnerJoin[Hash Join] 9 - ScanProject {table: customer} 10 -> TableScan {table: customer} 11 -> LocalExchange[HASH] 12 -> Exchange[REPLICATE] 13 - ScanProject {table: nation} 14 -> TableScan {table: nation} 15 -> LocalExchange[HASH] 16 -> Exchange[REPLICATE] 17 - ScanProject {table: region} 18 -> TableScan {table: region}
- Add the
- Enable the plan hint for join order feature
AnalyticDB for MySQL also supports the
plan hint for join order
feature to help you change the join order. You can add theleading_join_order
hint to the beginning of the query to change the join order in AnalyticDB for MySQL.Note To help you identify a table that is referenced multiple times in a query, you must specify an alias for the table when you use the plan hint for join order feature.After you enable the plan hint for join order feature for the SQL statement shown in the preceding example, the same join order is generated.
/*+ leading_join_order=((c n) r) */ EXPLAIN SELECT count(*) FROM nation n , region r, customer c WHERE c_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA';
Execution plan optimization is a broad field. This topic discusses basic optimization methods for AnalyticDB for MySQL. More optimization best practices are supplemented in a continuous way.