All Products
Search
Document Center

PolarDB:Cost-based query transformation

Last Updated:Jan 30, 2024

This topic describes how PolarDB for MySQL uses cost-based query transformation (CBQT) to improve the execution efficiency of complex queries.

Prerequisites

The version of the cluster is PolarDB for MySQL 8.0 and the revision version of the cluster is 8.0.2.2.1 or later. For information about how to view the version of your cluster, see Query the engine version.

Background

MySQL Community Edition optimizes queries mainly based on rules. It takes costs into consideration only for the subquery, instead of the entire query. This is ineffective for complex queries due to reasons such as cardinality, access methods, and JOIN orders. When PolarDB for MySQL executes complex queries, it performs cost-based query transformation, which is a more effective method. For a complex query, CBQT combines all the possible transformation methods into a collection that is named state space. Then, CBQT selects the execution plan with the lowest costs for the query. The following figure describes the cost-based query transformation process. CBQT collects the transformation methods A and B and generates the following combinations in the state space: None (no transformation), A (transformation A only), B (transformation B only), and AB (both transformation A and B). The combinations in the state space each correspond to an execution plan. CBQT then selects the execution plan with the lowest costs. In this example, Plan 2 is selected, which corresponds to transformation A. Cost-based query transformation
Subquery decorrelation that uses GROUP BY clauses is implemented based on cost-based query transformation. The derived merge feature of MySQL Community Edition can also perform cost-based query transformation. For example, in the following query, the fact table f1 and dimension table d1 are joined, and the fact table f2 and dimension table d2 are joined. Then the result tables of the preceding joins are joined. In MySQL Community Edition, the derived merge feature is implemented based on rule-based query transformation. If MySQL Community Edition is used to execute the query, the d1 and d2 tables are merged into the outer query block.
SELECT *
FROM d1
JOIN f1 ON d1.c1 = f1.c1
LEFT JOIN
  (SELECT d2.c2 d2_c2,
          f2.c3 f2_c3
   FROM d2,
        f2
   WHERE d2.c1 = f2.c1) derived ON derived.d2_c2 = d1.c2
AND derived.f2_c3 = f1.c3;
The following figure illustrates the execution plan for the query in MySQL Community Edition.derived mergeIn this execution plan, d1 and f1 are not associated with d2 and f2 by indexes. After d1 and f1 are joined, the d2 and f2 tables must be separately joined in sequence, which consumes a large amount of computing resources. If PolarDB for MySQL is used to execute the query, CBQT determines that the derived merge method is not ideal for the query. Therefore, it uses another execution plan, as illustrated in the following figure.New execution planIn this execution plan, the result set of joining f1 and d1 and the result set of joining f2 and d2 are joined, which leads to higher execution efficiency.

Use cost-based query transformation

CBQT is controlled by using the cbqt_enabled parameter. The default value is ON. CBQT may spend a long period of time selecting the optimal execution plan. To avoid impact on short-lived queries, the cbqt_cost_threshold parameter is provided to specify a threshold for triggering CBQT for a query. CBQT is triggered only when the cost of a query exceeds this parameter value. The derived merge feature in MySQL Community Edition is controlled by the derived_merge_cost_based variable of the polar_optimizer_switch parameter. The default value of the variable is OFF. The following table describes the parameters.
ParameterLevelDescription
cbqt_enabledGlobal and sessionSpecifies whether to enable CBQT. Default value: ON. Valid values:
  • ON
  • OFF
cbqt_cost_thresholdGlobal and sessionThe threshold for triggering CBQT for a query. CBQT is triggered only when the cost of a query exceeds this parameter value. Value values: 0 to 18446744073709551615. Default value: 100000.
cbqt_timeoutGlobal and sessionThe timeout period for CBQT. If the time consumed for finding the optimal execution plan is longer than the specified period of time, CBQT stops looking for the optimal plan, and uses the best plan it has found to execute the query. Value values: 0 to 18446744073709551615. Default value: 200. Unit: milliseconds.
Note If you set the value to 0, no timeout period is specified.
polar_optimizer_switchGlobal and sessionSpecifies whether to enable the query optimization features of PolarDB. Variable description:
  • unnest_use_window_function: specifies whether to decorrelate subqueries by using window functions. Default value: ON. Valid values:
    • ON
    • OFF
  • unnest_use_group_by: specifies whether to decorrelate subqueries by using GROUP BY clauses. This subquery decorrelation method is implemented based on CBQT. Default value: ON. Valid values:
    • ON
    • OFF
  • derived_merge_cost_based: specifies whether the derived merge feature is implemented based on CBQT. Default value: OFF. Valid values:
    • ON
    • OFF