This topic describes how PolarDB for MySQL uses cost-based query transformation (CBQT) to improve the execution efficiency of complex queries.
Prerequisites
The version of the cluster is PolarDB for MySQL 8.0 and the revision version of the cluster is 8.0.2.2.1 or later. For information about how to view the version of your cluster, see Query the engine version.
Background
MySQL Community Edition optimizes queries mainly based on rules. It takes costs into consideration only for the subquery, instead of the entire query. This is ineffective for complex queries due to reasons such as cardinality, access methods, and JOIN orders. When PolarDB for MySQL executes complex queries, it performs cost-based query transformation, which is a more effective method. For a complex query, CBQT combines all the possible transformation methods into a collection that is named state space. Then, CBQT selects the execution plan with the lowest costs for the query. The following figure describes the cost-based query transformation process. CBQT collects the transformation methods A and B and generates the following combinations in the state space: None (no transformation), A (transformation A only), B (transformation B only), and AB (both transformation A and B). The combinations in the state space each correspond to an execution plan. CBQT then selects the execution plan with the lowest costs. In this example, Plan 2 is selected, which corresponds to transformation A.
Subquery decorrelation that uses GROUP BY clauses is implemented based on cost-based query transformation. The derived merge feature of MySQL Community Edition can also perform cost-based query transformation. For example, in the following query, the fact table
f1
and dimension table d1
are joined, and the fact table f2
and dimension table d2
are joined. Then the result tables of the preceding joins are joined. In MySQL Community Edition, the derived merge feature is implemented based on rule-based query transformation. If MySQL Community Edition is used to execute the query, the d1
and d2
tables are merged into the outer query block. SELECT *
FROM d1
JOIN f1 ON d1.c1 = f1.c1
LEFT JOIN
(SELECT d2.c2 d2_c2,
f2.c3 f2_c3
FROM d2,
f2
WHERE d2.c1 = f2.c1) derived ON derived.d2_c2 = d1.c2
AND derived.f2_c3 = f1.c3;
The following figure illustrates the execution plan for the query in MySQL Community Edition.In this execution plan, d1
and f1
are not associated with d2
and f2
by indexes. After d1
and f1
are joined, the d2
and f2
tables must be separately joined in sequence, which consumes a large amount of computing resources. If PolarDB for MySQL is used to execute the query, CBQT determines that the derived merge method is not ideal for the query. Therefore, it uses another execution plan, as illustrated in the following figure.In this execution plan, the result set of joining f1
and d1
and the result set of joining f2
and d2
are joined, which leads to higher execution efficiency. Use cost-based query transformation
CBQT is controlled by using the cbqt_enabled parameter. The default value is
ON
. CBQT may spend a long period of time selecting the optimal execution plan. To avoid impact on short-lived queries, the cbqt_cost_threshold parameter is provided to specify a threshold for triggering CBQT for a query. CBQT is triggered only when the cost of a query exceeds this parameter value. The derived merge feature in MySQL Community Edition is controlled by the derived_merge_cost_based
variable of the polar_optimizer_switch parameter. The default value of the variable is OFF
. The following table describes the parameters.Parameter | Level | Description |
cbqt_enabled | Global and session | Specifies whether to enable CBQT. Default value: ON. Valid values:
|
cbqt_cost_threshold | Global and session | The threshold for triggering CBQT for a query. CBQT is triggered only when the cost of a query exceeds this parameter value. Value values: 0 to 18446744073709551615. Default value: 100000. |
cbqt_timeout | Global and session | The timeout period for CBQT. If the time consumed for finding the optimal execution plan is longer than the specified period of time, CBQT stops looking for the optimal plan, and uses the best plan it has found to execute the query. Value values: 0 to 18446744073709551615. Default value: 200. Unit: milliseconds. Note If you set the value to 0, no timeout period is specified. |
polar_optimizer_switch | Global and session | Specifies whether to enable the query optimization features of PolarDB. Variable description:
|