PolarDB for PostgreSQL supports cost-based query transformation (CBQT) that can perform a certain type of transformation based on the execution cost. This significantly improves the execution efficiency of some complex queries.
Background information
Query transformation refers to the process of transforming a query statement to another equivalent query statement. In PostgreSQL, common query transformations include subquery pull-up, outer join removal, expression preprocessing, useless join removal, and predicate pushdown. These transformations are performed based on equivalence rules. After performing these transformations, a query plan is optimized. PostgreSQL always attempts to perform these transformations.
However, after performing some other transformations, such as sublink pushdown and OR to UNION ALL conversion, a query plan may not be optimized. Therefore, PolarDB for PostgreSQL performs query transformations based on CBQT.
For a complex query, the CBQT framework collects cost-based query transformations that can be performed in each query block and forms a state space. CBQT searches the state space based on the specified strategy to select the state with the lowest cost.
As shown in the following figure, for the input SQL statement, CBQT collects cost-based query transformations A and B in Query Block 1 and Query Block 2. The state space consists of the following states:
None: No transformation is performed.
[1, A]: Transformation A is performed for Query Block 1.
[1, B]: Transformation B is performed for Query Block 1.
[2, A]: Transformation A is performed for Query Block 2.
CBQT performs these transformations in the state space in turn. Under the default search strategy linear, state [1, B] can optimize the query plan the most. As a result, [1, B] is used in Plan 4.
Prerequisites
Your PolarDB for PostgreSQL cluster runs the following database engine version:
PolarDB for PostgreSQL 14 whose revision version is 2.0.14.13.28.0 or later
PolarDB for PostgreSQL 11 whose revision version is 2.0.11.15.44.0 or later
You can view the revision version in the console or execute the SHOW polardb_version; statement to query the revision version. To upgrade the revision version, see Version management.
Usage
Configure the following parameters to manage CBQT provided by PolarDB for PostgreSQL:
For clusters whose revision version is 2.0.14.15.29.0 or later, you can directly configure the parameters in the PolarDB console. For more information, see Configure cluster parameters. You can also connect to the cluster to configure the parameters. For information about how to connect to a cluster, see Connect to a PolarDB for PostgreSQL cluster.
Parameter | Description |
polar_enable_cbqt | Specifies whether to enable the CBQT feature. Valid values:
|
polar_cbqt_cost_threshold | The cost threshold for enabling CBQT. Valid values: |
polar_cbqt_strategy | The search strategy for the CBQT state space. Valid values:
|
polar_cbqt_iteration_limit | The number of iterations of CBQT. Valid values: A greater number of iterations indicates a higher possibility of selecting the optimal plan and more time consumption. A smaller number of iterations indicates a lower possibility of selecting the optimal plan and less time consumption. |
The following cost-based query rewrite features are supported:
Parameter | Description |
polar_cbqt_convert_or_to_union_all_mode | Converts the OR clause to UNION ALL to improve query efficiency. For more information, see Convert an OR condition into a UNION ALL of two or more separate queries:. |
polar_cbqt_pushdown_sublink | Pushes the sublink down to a subquery and uses the index in the subquery to generate a parameterized path to improve the query efficiency. For more information, see Sublink pushdown. |
Examples
In this example, the sublink pushdown feature is used to demonstrate the CBQT feature and its parameters.
Create the test table and insert test data.
CREATE TABLE t_small(a int); CREATE TABLE t_big(a int, b int, c int); CREATE INDEX ON t_big(a); INSERT INTO t_big SELECT i, i, i FROM generate_series(1, 1000000)i; INSERT INTO t_small VALUES(1), (1000000); ANALYZE t_small, t_big;Disable the CBQT feature and enable the sublink pushdown feature. In this case, the sublink pushdown feature does not take effect. A full table scan is performed on the
t_bigtable, which is inefficient.-- Disable the CBQT feature SET polar_enable_cbqt to off; -- Enable the sublink pushdown feature SET polar_cbqt_pushdown_sublink to on; EXPLAIN ANALYZE SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);Sample result:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Merge Semi Join (cost=1.46..59511.17 rows=10000 width=12) (actual time=0.052..1274.435 rows=2 loops=1) Merge Cond: (t_big.a = t_small.a) -> GroupAggregate (cost=0.42..46910.13 rows=1000000 width=12) (actual time=0.033..1151.005 rows=1000000 loops=1) Group Key: t_big.a -> Index Scan using t_big_a_idx on t_big (cost=0.42..31910.13 rows=1000000 width=8) (actual time=0.022..433.821 rows=1000000 loops=1) -> Sort (cost=1.03..1.03 rows=2 width=4) (actual time=0.015..0.016 rows=2 loops=1) Sort Key: t_small.a Sort Method: quicksort Memory: 25kB -> Seq Scan on t_small (cost=0.00..1.02 rows=2 width=4) (actual time=0.005..0.006 rows=2 loops=1) Planning Time: 0.904 ms Execution Time: 1274.539 ms (11 rows)Enable the CBQT feature and the sublink pushdown feature. In this case, the sublink pushdown feature takes effect. The
a in (select a from t_small)clause is pushed down to the subquery. A parameterized path is generated for thet_bigtable based on the join condition, which significantly reduces the amount of data to be scanned and the execution time.-- Enable the CBQT feature SET polar_enable_cbqt to on; -- Enable the sublink pushdown feature SET polar_cbqt_pushdown_sublink to on; EXPLAIN ANALYZE SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);Sample result:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=17.96..17.99 rows=2 width=12) (actual time=0.060..0.063 rows=2 loops=1) Group Key: t_big.a -> Sort (cost=17.96..17.96 rows=2 width=8) (actual time=0.052..0.053 rows=2 loops=1) Sort Key: t_big.a Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=1.46..17.95 rows=2 width=8) (actual time=0.032..0.046 rows=2 loops=1) -> Unique (cost=1.03..1.04 rows=2 width=4) (actual time=0.014..0.018 rows=2 loops=1) -> Sort (cost=1.03..1.03 rows=2 width=4) (actual time=0.013..0.014 rows=2 loops=1) Sort Key: t_small.a Sort Method: quicksort Memory: 25kB -> Seq Scan on t_small (cost=0.00..1.02 rows=2 width=4) (actual time=0.006..0.007 rows=2 loops=1) -> Index Scan using t_big_a_idx on t_big (cost=0.42..8.44 rows=1 width=8) (actual time=0.009..0.010 rows=1 loops=2) Index Cond: (a = t_small.a) Planning Time: 0.644 ms Execution Time: 0.150 ms (15 rows)If the execution cost of an SQL statement does not reach the cost threshold of CBQT, the CBQT feature is not enabled. The original query plan is used. In the following example, the cost of the original query plan is 59511.17 and the
polar_cbqt_cost_thresholdparameter is set to 500000:-- Enable the CBQT feature SET polar_enable_cbqt to on; --- Set the CBQT cost threshold SET polar_cbqt_cost_threshold to 500000; -- Enable the sublink pushdown feature SET polar_cbqt_pushdown_sublink to on; EXPLAIN ANALYZE SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);Sample result:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Merge Semi Join (cost=1.46..59511.17 rows=10000 width=12) (actual time=0.059..1253.452 rows=2 loops=1) Merge Cond: (t_big.a = t_small.a) -> GroupAggregate (cost=0.42..46910.13 rows=1000000 width=12) (actual time=0.041..1127.255 rows=1000000 loops=1) Group Key: t_big.a -> Index Scan using t_big_a_idx on t_big (cost=0.42..31910.13 rows=1000000 width=8) (actual time=0.029..414.488 rows=1000000 loops=1) -> Sort (cost=1.03..1.03 rows=2 width=4) (actual time=0.014..0.015 rows=2 loops=1) Sort Key: t_small.a Sort Method: quicksort Memory: 25kB -> Seq Scan on t_small (cost=0.00..1.02 rows=2 width=4) (actual time=0.005..0.006 rows=2 loops=1) Planning Time: 0.280 ms Execution Time: 1253.558 ms (11 rows)Set the search strategy for the CBQT state space. In the following example, two sublinks can be pushed down, but pushing down only the second sublink is optimal.
Set polar_cbqt_strategy to linear, which is the linear search strategy. CBQT selects the optimal plan.
-- Enable the CBQT feature SET polar_enable_cbqt to on; --- Set the search strategy for the CBQT state space SET polar_cbqt_strategy to linear; -- Enable the sublink pushdown feature SET polar_cbqt_pushdown_sublink to on; EXPLAIN SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_big) UNION ALL SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);Sample result:
QUERY PLAN ------------------------------------------------------------------------------------------------------------- Append (cost=0.85..105692.60 rows=500002 width=12) -> Merge Semi Join (cost=0.85..98174.56 rows=500000 width=12) Merge Cond: (t_big_1.a = t_big.a) -> GroupAggregate (cost=0.42..46910.13 rows=1000000 width=12) Group Key: t_big_1.a -> Index Scan using t_big_a_idx on t_big t_big_1 (cost=0.42..31910.13 rows=1000000 width=8) -> Index Only Scan using t_big_a_idx on t_big (cost=0.42..26264.42 rows=1000000 width=4) -> GroupAggregate (cost=17.96..17.99 rows=2 width=12) Group Key: t_big_2.a -> Sort (cost=17.96..17.96 rows=2 width=8) Sort Key: t_big_2.a -> Nested Loop (cost=1.46..17.95 rows=2 width=8) -> Unique (cost=1.03..1.04 rows=2 width=4) -> Sort (cost=1.03..1.03 rows=2 width=4) Sort Key: t_small.a -> Seq Scan on t_small (cost=0.00..1.02 rows=2 width=4) -> Index Scan using t_big_a_idx on t_big t_big_2 (cost=0.42..8.44 rows=1 width=8) Index Cond: (a = t_small.a) (18 rows)Set polar_cbqt_strategy to twophase, which is the two-phase search strategy. Both sublinks are pushed down.
-- Enable the CBQT feature SET polar_enable_cbqt to on; --- Set the search strategy for the CBQT state space SET polar_cbqt_strategy to twophase; -- Enable the sublink pushdown feature SET polar_cbqt_pushdown_sublink to on; EXPLAIN SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_big) UNION ALL SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);Sample result:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Append (cost=0.85..113192.60 rows=1000002 width=12) -> GroupAggregate (cost=0.85..88174.56 rows=1000000 width=12) Group Key: t_big.a -> Merge Semi Join (cost=0.85..73174.56 rows=1000000 width=8) Merge Cond: (t_big.a = t_big_1.a) -> Index Scan using t_big_a_idx on t_big (cost=0.42..31910.13 rows=1000000 width=8) -> Index Only Scan using t_big_a_idx on t_big t_big_1 (cost=0.42..26264.42 rows=1000000 width=4) -> GroupAggregate (cost=17.96..17.99 rows=2 width=12) Group Key: t_big_2.a -> Sort (cost=17.96..17.96 rows=2 width=8) Sort Key: t_big_2.a -> Nested Loop (cost=1.46..17.95 rows=2 width=8) -> Unique (cost=1.03..1.04 rows=2 width=4) -> Sort (cost=1.03..1.03 rows=2 width=4) Sort Key: t_small.a -> Seq Scan on t_small (cost=0.00..1.02 rows=2 width=4) -> Index Scan using t_big_a_idx on t_big t_big_2 (cost=0.42..8.44 rows=1 width=8) Index Cond: (a = t_small.a) (18 rows)
Set the number of iterations of CBQT. Set
polar_cbqt_iteration_limitto 1 to limit the number of iterations of CBQT to 1. In the preceding scenario, even if pushing down only the second sublink is optimal, CBQT does not apply that state due to the limited number of iterations.-- Enable the CBQT feature SET polar_enable_cbqt to on; --- Set the search strategy for the CBQT state space SET polar_cbqt_strategy to twophase; --- Set the number of CBQT iterations SET polar_cbqt_iteration_limit to 1; -- Enable the sublink pushdown feature SET polar_cbqt_pushdown_sublink to on; EXPLAIN SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_big) UNION ALL SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);Sample result:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Append (cost=0.85..113192.60 rows=1000002 width=12) -> GroupAggregate (cost=0.85..88174.56 rows=1000000 width=12) Group Key: t_big.a -> Merge Semi Join (cost=0.85..73174.56 rows=1000000 width=8) Merge Cond: (t_big.a = t_big_1.a) -> Index Scan using t_big_a_idx on t_big (cost=0.42..31910.13 rows=1000000 width=8) -> Index Only Scan using t_big_a_idx on t_big t_big_1 (cost=0.42..26264.42 rows=1000000 width=4) -> GroupAggregate (cost=17.96..17.99 rows=2 width=12) Group Key: t_big_2.a -> Sort (cost=17.96..17.96 rows=2 width=8) Sort Key: t_big_2.a -> Nested Loop (cost=1.46..17.95 rows=2 width=8) -> Unique (cost=1.03..1.04 rows=2 width=4) -> Sort (cost=1.03..1.03 rows=2 width=4) Sort Key: t_small.a -> Seq Scan on t_small (cost=0.00..1.02 rows=2 width=4) -> Index Scan using t_big_a_idx on t_big t_big_2 (cost=0.42..8.44 rows=1 width=8) Index Cond: (a = t_small.a) (18 rows)