All Products
Search
Document Center

PolarDB:Cost-based query transformation (CBQT)

Last Updated:Apr 15, 2025

PolarDB for PostgreSQL supports cost-based query transformation (CBQT) that can perform a certain type of transformation based on the execution cost. This significantly improves the execution efficiency of some complex queries.

Background information

Query transformation refers to the process of transforming a query statement to another equivalent query statement. In PostgreSQL, common query transformations include subquery pull-up, outer join removal, expression preprocessing, useless join removal, and predicate pushdown. These transformations are performed based on equivalence rules. After performing these transformations, a query plan is optimized. PostgreSQL always attempts to perform these transformations.

However, after performing some other transformations, such as sublink pushdown and OR to UNION ALL conversion, a query plan may not be optimized. Therefore, PolarDB for PostgreSQL performs query transformations based on CBQT.

For a complex query, the CBQT framework collects cost-based query transformations that can be performed in each query block and forms a state space. CBQT searches the state space based on the specified strategy to select the state with the lowest cost.

As shown in the following figure, for the input SQL statement, CBQT collects cost-based query transformations A and B in Query Block 1 and Query Block 2. The state space consists of the following states:

  • None: No transformation is performed.

  • [1, A]: Transformation A is performed for Query Block 1.

  • [1, B]: Transformation B is performed for Query Block 1.

  • [2, A]: Transformation A is performed for Query Block 2.

CBQT performs these transformations in the state space in turn. Under the default search strategy linear, state [1, B] can optimize the query plan the most. As a result, [1, B] is used in Plan 4.

image

Prerequisites

Your PolarDB for PostgreSQL cluster runs the following database engine version:

  • PolarDB for PostgreSQL 14 whose revision version is 2.0.14.13.28.0 or later

  • PolarDB for PostgreSQL 11 whose revision version is 2.0.11.15.44.0 or later

Note

You can view the revision version in the console or execute the SHOW polardb_version; statement to query the revision version. To upgrade the revision version, see Version management.

Usage

Configure the following parameters to manage CBQT provided by PolarDB for PostgreSQL:

Note

For clusters whose revision version is 2.0.14.15.29.0 or later, you can directly configure the parameters in the PolarDB console. For more information, see Configure cluster parameters. You can also connect to the cluster to configure the parameters. For information about how to connect to a cluster, see Connect to a PolarDB for PostgreSQL cluster.

Parameter

Description

polar_enable_cbqt

Specifies whether to enable the CBQT feature. Valid values:

  • off (default)

  • on

polar_cbqt_cost_threshold

The cost threshold for enabling CBQT. Valid values: [0, +∞). Default value: 50000. CBQT is enabled only if the cost of the original plan exceeds the threshold.

polar_cbqt_strategy

The search strategy for the CBQT state space. Valid values:

  • linear (default): linear search. If this strategy is used, PolarDB compares the execution costs before and after each state is applied, and selects the optimal plan.

  • twophase: two-phase search. If this strategy is used, PolarDB compares the execution costs when all states are applied or all queries are not applied, and then selects the optimal plan.

polar_cbqt_iteration_limit

The number of iterations of CBQT. Valid values: [1, +∞). Default value: 10.

A greater number of iterations indicates a higher possibility of selecting the optimal plan and more time consumption. A smaller number of iterations indicates a lower possibility of selecting the optimal plan and less time consumption.

The following cost-based query rewrite features are supported:

Parameter

Description

polar_cbqt_convert_or_to_union_all_mode

Converts the OR clause to UNION ALL to improve query efficiency. For more information, see Convert an OR condition into a UNION ALL of two or more separate queries:.

polar_cbqt_pushdown_sublink

Pushes the sublink down to a subquery and uses the index in the subquery to generate a parameterized path to improve the query efficiency. For more information, see Sublink pushdown.

Examples

In this example, the sublink pushdown feature is used to demonstrate the CBQT feature and its parameters.

  1. Create the test table and insert test data.

    CREATE TABLE t_small(a int);
    CREATE TABLE t_big(a int, b int, c int);
    
    CREATE INDEX ON t_big(a);
    
    INSERT INTO t_big SELECT i, i, i FROM generate_series(1, 1000000)i;
    INSERT INTO t_small VALUES(1), (1000000);
    
    ANALYZE t_small, t_big;
  2. Disable the CBQT feature and enable the sublink pushdown feature. In this case, the sublink pushdown feature does not take effect. A full table scan is performed on the t_big table, which is inefficient.

    -- Disable the CBQT feature
    SET polar_enable_cbqt to off;
    
    -- Enable the sublink pushdown feature
    SET polar_cbqt_pushdown_sublink to on;
    
    EXPLAIN ANALYZE SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);

    Sample result:

                                                                       QUERY PLAN
    -------------------------------------------------------------------------------------------------------------------------------------------------
     Merge Semi Join  (cost=1.46..59511.17 rows=10000 width=12) (actual time=0.052..1274.435 rows=2 loops=1)
       Merge Cond: (t_big.a = t_small.a)
       ->  GroupAggregate  (cost=0.42..46910.13 rows=1000000 width=12) (actual time=0.033..1151.005 rows=1000000 loops=1)
             Group Key: t_big.a
             ->  Index Scan using t_big_a_idx on t_big  (cost=0.42..31910.13 rows=1000000 width=8) (actual time=0.022..433.821 rows=1000000 loops=1)
       ->  Sort  (cost=1.03..1.03 rows=2 width=4) (actual time=0.015..0.016 rows=2 loops=1)
             Sort Key: t_small.a
             Sort Method: quicksort  Memory: 25kB
             ->  Seq Scan on t_small  (cost=0.00..1.02 rows=2 width=4) (actual time=0.005..0.006 rows=2 loops=1)
     Planning Time: 0.904 ms
     Execution Time: 1274.539 ms
    (11 rows)
  3. Enable the CBQT feature and the sublink pushdown feature. In this case, the sublink pushdown feature takes effect. The a in (select a from t_small) clause is pushed down to the subquery. A parameterized path is generated for the t_big table based on the join condition, which significantly reduces the amount of data to be scanned and the execution time.

    -- Enable the CBQT feature
    SET polar_enable_cbqt to on;
    
    -- Enable the sublink pushdown feature
    SET polar_cbqt_pushdown_sublink to on;
    
    EXPLAIN ANALYZE SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);

    Sample result:

                                                                 QUERY PLAN
    -------------------------------------------------------------------------------------------------------------------------------------
     GroupAggregate  (cost=17.96..17.99 rows=2 width=12) (actual time=0.060..0.063 rows=2 loops=1)
       Group Key: t_big.a
       ->  Sort  (cost=17.96..17.96 rows=2 width=8) (actual time=0.052..0.053 rows=2 loops=1)
             Sort Key: t_big.a
             Sort Method: quicksort  Memory: 25kB
             ->  Nested Loop  (cost=1.46..17.95 rows=2 width=8) (actual time=0.032..0.046 rows=2 loops=1)
                   ->  Unique  (cost=1.03..1.04 rows=2 width=4) (actual time=0.014..0.018 rows=2 loops=1)
                         ->  Sort  (cost=1.03..1.03 rows=2 width=4) (actual time=0.013..0.014 rows=2 loops=1)
                               Sort Key: t_small.a
                               Sort Method: quicksort  Memory: 25kB
                               ->  Seq Scan on t_small  (cost=0.00..1.02 rows=2 width=4) (actual time=0.006..0.007 rows=2 loops=1)
                   ->  Index Scan using t_big_a_idx on t_big  (cost=0.42..8.44 rows=1 width=8) (actual time=0.009..0.010 rows=1 loops=2)
                         Index Cond: (a = t_small.a)
     Planning Time: 0.644 ms
     Execution Time: 0.150 ms
    (15 rows)
  4. If the execution cost of an SQL statement does not reach the cost threshold of CBQT, the CBQT feature is not enabled. The original query plan is used. In the following example, the cost of the original query plan is 59511.17 and the polar_cbqt_cost_threshold parameter is set to 500000:

    -- Enable the CBQT feature
    SET polar_enable_cbqt to on;
    
    --- Set the CBQT cost threshold
    SET polar_cbqt_cost_threshold to 500000;
    
    -- Enable the sublink pushdown feature
    SET polar_cbqt_pushdown_sublink to on;
    
    EXPLAIN ANALYZE SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);

    Sample result:

                                                                       QUERY PLAN
    -------------------------------------------------------------------------------------------------------------------------------------------------
     Merge Semi Join  (cost=1.46..59511.17 rows=10000 width=12) (actual time=0.059..1253.452 rows=2 loops=1)
       Merge Cond: (t_big.a = t_small.a)
       ->  GroupAggregate  (cost=0.42..46910.13 rows=1000000 width=12) (actual time=0.041..1127.255 rows=1000000 loops=1)
             Group Key: t_big.a
             ->  Index Scan using t_big_a_idx on t_big  (cost=0.42..31910.13 rows=1000000 width=8) (actual time=0.029..414.488 rows=1000000 loops=1)
       ->  Sort  (cost=1.03..1.03 rows=2 width=4) (actual time=0.014..0.015 rows=2 loops=1)
             Sort Key: t_small.a
             Sort Method: quicksort  Memory: 25kB
             ->  Seq Scan on t_small  (cost=0.00..1.02 rows=2 width=4) (actual time=0.005..0.006 rows=2 loops=1)
     Planning Time: 0.280 ms
     Execution Time: 1253.558 ms
    (11 rows)
  5. Set the search strategy for the CBQT state space. In the following example, two sublinks can be pushed down, but pushing down only the second sublink is optimal.

    • Set polar_cbqt_strategy to linear, which is the linear search strategy. CBQT selects the optimal plan.

      -- Enable the CBQT feature
      SET polar_enable_cbqt to on;
      
      --- Set the search strategy for the CBQT state space
      SET polar_cbqt_strategy to linear;
      
      -- Enable the sublink pushdown feature
      SET polar_cbqt_pushdown_sublink to on;
      
      EXPLAIN SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_big) UNION ALL SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);

      Sample result:

                                                       QUERY PLAN
      -------------------------------------------------------------------------------------------------------------
       Append  (cost=0.85..105692.60 rows=500002 width=12)
         ->  Merge Semi Join  (cost=0.85..98174.56 rows=500000 width=12)
               Merge Cond: (t_big_1.a = t_big.a)
               ->  GroupAggregate  (cost=0.42..46910.13 rows=1000000 width=12)
                     Group Key: t_big_1.a
                     ->  Index Scan using t_big_a_idx on t_big t_big_1  (cost=0.42..31910.13 rows=1000000 width=8)
               ->  Index Only Scan using t_big_a_idx on t_big  (cost=0.42..26264.42 rows=1000000 width=4)
         ->  GroupAggregate  (cost=17.96..17.99 rows=2 width=12)
               Group Key: t_big_2.a
               ->  Sort  (cost=17.96..17.96 rows=2 width=8)
                     Sort Key: t_big_2.a
                     ->  Nested Loop  (cost=1.46..17.95 rows=2 width=8)
                           ->  Unique  (cost=1.03..1.04 rows=2 width=4)
                                 ->  Sort  (cost=1.03..1.03 rows=2 width=4)
                                       Sort Key: t_small.a
                                       ->  Seq Scan on t_small  (cost=0.00..1.02 rows=2 width=4)
                           ->  Index Scan using t_big_a_idx on t_big t_big_2  (cost=0.42..8.44 rows=1 width=8)
                                 Index Cond: (a = t_small.a)
      (18 rows)
    • Set polar_cbqt_strategy to twophase, which is the two-phase search strategy. Both sublinks are pushed down.

      -- Enable the CBQT feature
      SET polar_enable_cbqt to on;
      
      --- Set the search strategy for the CBQT state space
      SET polar_cbqt_strategy to twophase;
      
      -- Enable the sublink pushdown feature
      SET polar_cbqt_pushdown_sublink to on;
      
      EXPLAIN SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_big) UNION ALL SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);

      Sample result:

                                                          QUERY PLAN
      ------------------------------------------------------------------------------------------------------------------
       Append  (cost=0.85..113192.60 rows=1000002 width=12)
         ->  GroupAggregate  (cost=0.85..88174.56 rows=1000000 width=12)
               Group Key: t_big.a
               ->  Merge Semi Join  (cost=0.85..73174.56 rows=1000000 width=8)
                     Merge Cond: (t_big.a = t_big_1.a)
                     ->  Index Scan using t_big_a_idx on t_big  (cost=0.42..31910.13 rows=1000000 width=8)
                     ->  Index Only Scan using t_big_a_idx on t_big t_big_1  (cost=0.42..26264.42 rows=1000000 width=4)
         ->  GroupAggregate  (cost=17.96..17.99 rows=2 width=12)
               Group Key: t_big_2.a
               ->  Sort  (cost=17.96..17.96 rows=2 width=8)
                     Sort Key: t_big_2.a
                     ->  Nested Loop  (cost=1.46..17.95 rows=2 width=8)
                           ->  Unique  (cost=1.03..1.04 rows=2 width=4)
                                 ->  Sort  (cost=1.03..1.03 rows=2 width=4)
                                       Sort Key: t_small.a
                                       ->  Seq Scan on t_small  (cost=0.00..1.02 rows=2 width=4)
                           ->  Index Scan using t_big_a_idx on t_big t_big_2  (cost=0.42..8.44 rows=1 width=8)
                                 Index Cond: (a = t_small.a)
      (18 rows)
  6. Set the number of iterations of CBQT. Set polar_cbqt_iteration_limit to 1 to limit the number of iterations of CBQT to 1. In the preceding scenario, even if pushing down only the second sublink is optimal, CBQT does not apply that state due to the limited number of iterations.

    -- Enable the CBQT feature
    SET polar_enable_cbqt to on;
    
    --- Set the search strategy for the CBQT state space
    SET polar_cbqt_strategy to twophase;
    
    --- Set the number of CBQT iterations
    SET polar_cbqt_iteration_limit to 1;
    
    -- Enable the sublink pushdown feature
    SET polar_cbqt_pushdown_sublink to on;
    
    EXPLAIN SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_big) UNION ALL SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);

    Sample result:

                                                        QUERY PLAN
    ------------------------------------------------------------------------------------------------------------------
     Append  (cost=0.85..113192.60 rows=1000002 width=12)
       ->  GroupAggregate  (cost=0.85..88174.56 rows=1000000 width=12)
             Group Key: t_big.a
             ->  Merge Semi Join  (cost=0.85..73174.56 rows=1000000 width=8)
                   Merge Cond: (t_big.a = t_big_1.a)
                   ->  Index Scan using t_big_a_idx on t_big  (cost=0.42..31910.13 rows=1000000 width=8)
                   ->  Index Only Scan using t_big_a_idx on t_big t_big_1  (cost=0.42..26264.42 rows=1000000 width=4)
       ->  GroupAggregate  (cost=17.96..17.99 rows=2 width=12)
             Group Key: t_big_2.a
             ->  Sort  (cost=17.96..17.96 rows=2 width=8)
                   Sort Key: t_big_2.a
                   ->  Nested Loop  (cost=1.46..17.95 rows=2 width=8)
                         ->  Unique  (cost=1.03..1.04 rows=2 width=4)
                               ->  Sort  (cost=1.03..1.03 rows=2 width=4)
                                     Sort Key: t_small.a
                                     ->  Seq Scan on t_small  (cost=0.00..1.02 rows=2 width=4)
                         ->  Index Scan using t_big_a_idx on t_big t_big_2  (cost=0.42..8.44 rows=1 width=8)
                               Index Cond: (a = t_small.a)
    (18 rows)