PolarDB for PostgreSQL provides a query optimization feature that can convert an OR condition into a UNION ALL of two or more separate queries. The conversion allows the database to select the most cost-effective execution path and results in a more efficient query execution.
Background information
The PostgreSQL optimizer has limited ability to optimize OR conditions in SQL queries. If the OR condition in a SQL query involves only a single table, and each part of the condition is properly indexed, the optimizer generates a BitmapOr index path. Example:
EXPLAIN SELECT * FROM my_test WHERE (id = 123 OR name = '123' OR salary = 123.0);
QUERY PLAN
---------------------------------------------------------------------------------------
Bitmap Heap Scan on my_test (cost=12.90..24.33 rows=3 width=15)
Recheck Cond: ((id = 123) OR ((name)::text = '123'::text) OR (salary = 123.0))
-> BitmapOr (cost=12.90..12.90 rows=3 width=0)
-> Bitmap Index Scan on my_test_id_idx (cost=0.00..4.30 rows=1 width=0)
Index Cond: (id = 123)
-> Bitmap Index Scan on my_test_name_idx (cost=0.00..4.30 rows=1 width=0)
Index Cond: ((name)::text = '123'::text)
-> Bitmap Index Scan on my_test_salary_idx (cost=0.00..4.30 rows=1 width=0)
Index Cond: (salary = 123.0)
(9 rows)If the OR condition in a SQL query involves multiple tables, the optimizer treats the parts of the OR condition as filter conditions applied after the tables are joined, which may result in a less efficient execution of the SQL query. Example:
EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE (t1.num = 1 OR t2.cnt = 2);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Hash Join (cost=299.00..660.50 rows=110 width=51) (actual time=5.992..15.673 rows=110 loops=1)
Hash Cond: (t1.id = t2.id)
Join Filter: ((t1.num = 1) OR (t2.cnt = 2))
Rows Removed by Join Filter: 9890
-> Seq Scan on t1 (cost=0.00..174.00 rows=10000 width=25) (actual time=0.012..2.080 rows=10000 loops=1)
-> Hash (cost=174.00..174.00 rows=10000 width=26) (actual time=5.855..5.857 rows=10000 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 704kB
-> Seq Scan on t2 (cost=0.00..174.00 rows=10000 width=26) (actual time=0.007..1.779 rows=10000 loops=1)
Planning Time: 1.237 ms
Execution Time: 15.836 ms
(10 rows)The parts of the OR condition in this query are treated as a single, combined condition, and the optimizer cannot use the index on the t1.num or t2.cnt column to accelerate the query. Instead, the optimizer needs to perform a full scan on the t1 and t2 tables. To optimize the query, you can convert the OR condition into a UNION ALL of two or more separate queries that each handle a part of the OR condition. Example:
EXPLAIN ANALYZE
SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE t1.num = 1
UNION ALL
SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE t2.cnt = 2 AND (t1.num != 1 OR (t1.num = 1) IS NULL);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Append (cost=85.48..412.26 rows=110 width=51) (actual time=0.350..4.832 rows=110 loops=1)
-> Hash Join (cost=85.48..297.98 rows=100 width=51) (actual time=0.349..4.653 rows=100 loops=1)
Hash Cond: (t2.id = t1.id)
-> Seq Scan on t2 (cost=0.00..174.00 rows=10000 width=26) (actual time=0.009..1.719 rows=10000 loops=1)
-> Hash (cost=84.23..84.23 rows=100 width=25) (actual time=0.318..0.320 rows=100 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 14kB
-> Bitmap Heap Scan on t1 (cost=5.06..84.23 rows=100 width=25) (actual time=0.065..0.265 rows=100 loops=1)
Recheck Cond: (num = 1)
Heap Blocks: exact=73
-> Bitmap Index Scan on t1_num_idx (cost=0.00..5.04 rows=100 width=0) (actual time=0.037..0.037 rows=100 loops=1)
Index Cond: (num = 1)
-> Nested Loop (cost=4.65..112.63 rows=10 width=51) (actual time=0.049..0.159 rows=10 loops=1)
-> Bitmap Heap Scan on t2 t2_1 (cost=4.36..33.46 rows=10 width=26) (actual time=0.026..0.045 rows=10 loops=1)
Recheck Cond: (cnt = 2)
Heap Blocks: exact=10
-> Bitmap Index Scan on t2_cnt_idx (cost=0.00..4.36 rows=10 width=0) (actual time=0.017..0.018 rows=10 loops=1)
Index Cond: (cnt = 2)
-> Index Scan using t1_id_idx on t1 t1_1 (cost=0.29..7.91 rows=1 width=25) (actual time=0.009..0.009 rows=1 loops=10)
Index Cond: (id = t2_1.id)
Filter: ((num <> 1) OR ((num = 1) IS NULL))
Planning Time: 1.150 ms
Execution Time: 5.014 ms
(22 rows)After the conversion, the optimizer can use indexes on the t1.num and t2.cnt columns to reduce the amount of data to process and improve the execution performance.
PolarDB for PostgreSQL provides a query optimization feature that can convert an OR condition into a UNION ALL of two or more separate queries. The conversion allows the database to select the most cost-effective execution path and results in a more efficient query execution.
Supported versions
Your PolarDB for PostgreSQL cluster runs one of the following database engine versions:
PostgreSQL 14 whose revision version is 14.13.27.0 or later
You can execute the following statement to query the database engine revision version of your PolarDB for PostgreSQL cluster:
SELECT version();To upgrade the database engine revision version of your cluster, see Upgrade the version.
Parameters
You can manage the conversion of an OR condition into a UNION ALL of two or more separate queries by using parameters. The following table describes the parameters:
Parameter | Description |
polar_cbqt_cost_threshold | The OR condition conversion threshold. If the estimated execution cost of a query that uses an OR condition exceeds the threshold, the optimizer converts the OR condition into a UNION ALL of separate queries. Valid values: If you set the parameter to 0, the system converts the OR condition of each query into a UNION ALL of separate queries. We recommend that you do not set this parameter to 0. This may increase the SQL execution time and degrade the performance. |
polar_cbqt_convert_or_to_union_all_mode | Specifies whether to enable the conversion of an OR condition into a UNION ALL of separate queries. Valid values:
|
Usage notes
If you set the
polar_cbqt_cost_thresholdparameter to 0, or thepolar_cbqt_convert_or_to_union_all_modeparameter to FORCE, the system forces the conversion of an OR condition into a UNION ALL of two or more separate queries. To force the conversion of the OR condition of a specific query into a UNION ALL of separate queries, we recommend that you use a hint. For more information, see the "Use hints" section of this topic.If the OR condition involves only one table, the optimizer does not convert the OR condition.
In forced conversion mode, the optimizer selects the execution path that has the lowest cost if multiple suitable queries exist.
If an OR condition contains more than six parameters, the optimizer does not convert the OR condition.
If a SQL query contains more than six OR conditions, the optimizer converts only the first six OR conditions.
Examples
Prepare data
CREATE TABLE t1(id int, num int, dsc text, log_date text);
CREATE TABLE t2(id int, cnt int, change text, op_date text);
INSERT INTO t1 SELECT i, i%100, 'test'||1, to_char('1990-10-10'::date + i, 'YYYY-MM-DD') FROM generate_series(1,10000)i;
INSERT INTO t2 SELECT i, i%1000, 'now'||i, to_char('1990-10-10'::date + i, 'YYYY-MM-DD') FROM generate_series(1,10000)i;
CREATE INDEX ON t1(id);
CREATE INDEX ON t1(num);
CREATE INDEX ON t2(id);
CREATE INDEX ON t2(cnt);
ANALYZE t1;
ANALYZE t2;Basic features
Disable the conversion of an OR condition into a UNION ALL of two or more separate queries.
SET polar_cbqt_cost_threshold to 100; SET polar_cbqt_convert_or_to_union_all_mode to off; EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE (t1.num = 1 OR t2.cnt = 2);Sample result:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Hash Join (cost=299.00..660.50 rows=110 width=51) (actual time=5.992..15.673 rows=110 loops=1) Hash Cond: (t1.id = t2.id) Join Filter: ((t1.num = 1) OR (t2.cnt = 2)) Rows Removed by Join Filter: 9890 -> Seq Scan on t1 (cost=0.00..174.00 rows=10000 width=25) (actual time=0.012..2.080 rows=10000 loops=1) -> Hash (cost=174.00..174.00 rows=10000 width=26) (actual time=5.855..5.857 rows=10000 loops=1) Buckets: 16384 Batches: 1 Memory Usage: 704kB -> Seq Scan on t2 (cost=0.00..174.00 rows=10000 width=26) (actual time=0.007..1.779 rows=10000 loops=1) Planning Time: 1.237 ms Execution Time: 15.836 ms (10 rows)Enable the conversion of an OR condition into a UNION ALL of two or more separate queries.
SET polar_cbqt_cost_threshold to 100; SET polar_cbqt_convert_or_to_union_all_mode to on; EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE (t1.num = 1 OR t2.cnt = 2);Sample result:
QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Append (cost=85.48..411.16 rows=110 width=51) (actual time=0.396..4.822 rows=110 loops=1) -> Hash Join (cost=85.48..297.98 rows=100 width=51) (actual time=0.395..4.639 rows=100 loops=1) Hash Cond: (t2.id = t1.id) -> Seq Scan on t2 (cost=0.00..174.00 rows=10000 width=26) (actual time=0.010..1.750 rows=10000 loops=1) -> Hash (cost=84.23..84.23 rows=100 width=25) (actual time=0.333..0.335 rows=100 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 14kB -> Bitmap Heap Scan on t1 (cost=5.06..84.23 rows=100 width=25) (actual time=0.056..0.247 rows=100 loops=1) Recheck Cond: (num = 1) Heap Blocks: exact=73 -> Bitmap Index Scan on t1_num_idx (cost=0.00..5.04 rows=100 width=0) (actual time=0.028..0.028 rows=100 loops=1) Index Cond: (num = 1) -> Nested Loop (cost=4.65..112.63 rows=10 width=51) (actual time=0.049..0.164 rows=10 loops=1) -> Bitmap Heap Scan on t2 (cost=4.36..33.46 rows=10 width=26) (actual time=0.027..0.044 rows=10 loops=1) Recheck Cond: (cnt = 2) Heap Blocks: exact=10 -> Bitmap Index Scan on t2_cnt_idx (cost=0.00..4.36 rows=10 width=0) (actual time=0.019..0.019 rows=10 loops=1) Index Cond: (cnt = 2) -> Index Scan using t1_id_idx on t1 (cost=0.29..7.91 rows=1 width=25) (actual time=0.010..0.010 rows=1 loops=10) Index Cond: (id = t2.id) Filter: ((num <> 1) OR ((num = 1) IS NULL)) Planning Time: 2.903 ms Execution Time: 4.980 ms (22 rows)
After you enable the conversion of an OR condition into a UNION ALL of two or more separate queries, the execution plan can use the indexes on the t1.num and t2.cnt columns to achieve the same effect as a manual conversion.
Force the conversion of an OR condition into a UNION ALL of two or more separate queries
Enable the conversion of an OR condition into a UNION ALL of two or more separate queries.
SET polar_cbqt_cost_threshold to 100; SET polar_cbqt_convert_or_to_union_all_mode to on; EXPLAIN ANALYZE SELECT * FROM t1, t2 WHERE t1.dsc= t2.change AND (t1.log_date = '2024-01-01' OR t2.op_date = '2024-01-01');Sample result:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Hash Join (cost=299.00..660.50 rows=2 width=51) (actual time=14.321..14.325 rows=0 loops=1) Hash Cond: (t1.dsc = t2.change) Join Filter: ((t1.log_date = '2024-01-01'::text) OR (t2.op_date = '2024-01-01'::text)) -> Seq Scan on t1 (cost=0.00..174.00 rows=10000 width=25) (actual time=0.016..3.204 rows=10000 loops=1) -> Hash (cost=174.00..174.00 rows=10000 width=26) (actual time=6.506..6.508 rows=10000 loops=1) Buckets: 16384 Batches: 1 Memory Usage: 704kB -> Seq Scan on t2 (cost=0.00..174.00 rows=10000 width=26) (actual time=0.006..1.755 rows=10000 loops=1) Planning Time: 0.932 ms Execution Time: 14.571 ms (9 rows)Force the conversion of an OR condition into a UNION ALL of two or more separate queries.
SET polar_cbqt_cost_threshold to 100; SET polar_cbqt_convert_or_to_union_all_mode to force; EXPLAIN ANALYZE SELECT * FROM t1, t2 WHERE t1.dsc= t2.change AND (t1.log_date = '2024-01-01' OR t2.op_date = '2024-01-01');Sample result:
QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Append (cost=199.01..871.05 rows=2 width=51) (actual time=9.915..9.923 rows=0 loops=1) -> Hash Join (cost=199.01..410.52 rows=1 width=51) (actual time=5.046..5.050 rows=0 loops=1) Hash Cond: (t2.change = t1.dsc) -> Seq Scan on t2 (cost=0.00..174.00 rows=10000 width=26) (actual time=0.015..0.015 rows=1 loops=1) -> Hash (cost=199.00..199.00 rows=1 width=25) (actual time=5.014..5.016 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Seq Scan on t1 (cost=0.00..199.00 rows=1 width=25) (actual time=5.013..5.013 rows=0 loops=1) Filter: (log_date = '2024-01-01'::text) Rows Removed by Filter: 10000 -> Hash Join (cost=199.01..460.52 rows=1 width=51) (actual time=4.865..4.867 rows=0 loops=1) Hash Cond: (t1.dsc = t2.change) -> Seq Scan on t1 (cost=0.00..224.00 rows=9999 width=25) (actual time=0.015..0.016 rows=1 loops=1) Filter: ((log_date <> '2024-01-01'::text) OR ((log_date = '2024-01-01'::text) IS NULL)) -> Hash (cost=199.00..199.00 rows=1 width=26) (actual time=4.828..4.829 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Seq Scan on t2 (cost=0.00..199.00 rows=1 width=26) (actual time=4.827..4.827 rows=0 loops=1) Filter: (op_date = '2024-01-01'::text) Rows Removed by Filter: 10000 Planning Time: 0.777 ms Execution Time: 10.088 ms (20 rows)
Based on the total cost of execution plans before and after the conversion, the total cost of the execution plan without the conversion of the OR condition is lower. In non-forced mode, the optimizer determines whether to convert the OR condition based on the cost estimation and may not always use the converted path. In forced mode, the optimizer is forced to use the converted path regardless of the estimated cost. The forced mode is suitable for scenarios in which the optimizer cannot accurately estimate the cost and the conversion of the OR condition is required.
Use hints
To force the conversion of the OR condition of a specific query into a UNION ALL of two or more separate queries, you can also use the hint feature.
SET polar_cbqt_convert_or_to_union_all_mode to off;
EXPLAIN ANALYZE /*+ Set(polar_cbqt_convert_or_to_union_all_mode force) Set(polar_cbqt_cost_threshold 0) */ SELECT * FROM t1, t2 WHERE t1.dsc= t2.change and (t1.log_date = '2024-01-01' or t2.op_date = '2024-01-01');Sample result:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Append (cost=199.01..871.05 rows=2 width=51) (actual time=9.684..9.691 rows=0 loops=1)
-> Hash Join (cost=199.01..410.52 rows=1 width=51) (actual time=4.711..4.714 rows=0 loops=1)
Hash Cond: (t2.change = t1.dsc)
-> Seq Scan on t2 (cost=0.00..174.00 rows=10000 width=26) (actual time=0.013..0.013 rows=1 loops=1)
-> Hash (cost=199.00..199.00 rows=1 width=25) (actual time=4.682..4.684 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Seq Scan on t1 (cost=0.00..199.00 rows=1 width=25) (actual time=4.681..4.681 rows=0 loops=1)
Filter: (log_date = '2024-01-01'::text)
Rows Removed by Filter: 10000
-> Hash Join (cost=199.01..460.52 rows=1 width=51) (actual time=4.969..4.970 rows=0 loops=1)
Hash Cond: (t1.dsc = t2.change)
-> Seq Scan on t1 (cost=0.00..224.00 rows=9999 width=25) (actual time=0.018..0.018 rows=1 loops=1)
Filter: ((log_date <> '2024-01-01'::text) OR ((log_date = '2024-01-01'::text) IS NULL))
-> Hash (cost=199.00..199.00 rows=1 width=26) (actual time=4.935..4.936 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Seq Scan on t2 (cost=0.00..199.00 rows=1 width=26) (actual time=4.934..4.934 rows=0 loops=1)
Filter: (op_date = '2024-01-01'::text)
Rows Removed by Filter: 10000
Planning Time: 0.798 ms
Execution Time: 9.858 ms
(20 rows)OR condition conversion threshold
In the basic feature example, the cost of the original SQL execution plan is 660.50. If you set the OR condition conversion threshold to a value higher than this value and execute the same SQL query again:
SET polar_cbqt_cost_threshold to 1000; -- Set the OR condition conversion threshold.
SET polar_cbqt_convert_or_to_union_all_mode to force; -- Force the conversion of the OR condition.
EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE (t1.num = 1 OR t2.cnt = 2);Sample result:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Hash Join (cost=299.00..660.50 rows=110 width=51) (actual time=6.374..15.802 rows=110 loops=1)
Hash Cond: (t1.id = t2.id)
Join Filter: ((t1.num = 1) OR (t2.cnt = 2))
Rows Removed by Join Filter: 9890
-> Seq Scan on t1 (cost=0.00..174.00 rows=10000 width=25) (actual time=0.011..2.038 rows=10000 loops=1)
-> Hash (cost=174.00..174.00 rows=10000 width=26) (actual time=6.266..6.268 rows=10000 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 704kB
-> Seq Scan on t2 (cost=0.00..174.00 rows=10000 width=26) (actual time=0.006..1.778 rows=10000 loops=1)
Planning Time: 0.663 ms
Execution Time: 16.036 ms
(10 rows)The preceding result indicates that even in forced mode, the optimizer still selects the original path. This is because the total cost of the original plan does not exceed the threshold. The optimizer does not convert the OR condition into a UNION ALL of two or more separate queries.