PolarDB for PostgreSQL and support a query optimization feature that converts OR clauses to the UNION ALL format. During execution plan generation, the optimizer attempts this conversion and selects a path based on cost to create a more optimal execution plan.
Background
The PostgreSQL optimizer has limited capabilities for optimizing OR filter conditions in SQL statements. If an OR clause involves only one table and each filter condition has a suitable index, the optimizer generates a BitmapOr Index Path. For example:
EXPLAIN SELECT * FROM my_test WHERE (id = 123 OR name = '123' OR salary = 123.0);
QUERY PLAN
---------------------------------------------------------------------------------------
Bitmap Heap Scan on my_test (cost=12.90..24.33 rows=3 width=15)
Recheck Cond: ((id = 123) OR ((name)::text = '123'::text) OR (salary = 123.0))
-> BitmapOr (cost=12.90..12.90 rows=3 width=0)
-> Bitmap Index Scan on my_test_id_idx (cost=0.00..4.30 rows=1 width=0)
Index Cond: (id = 123)
-> Bitmap Index Scan on my_test_name_idx (cost=0.00..4.30 rows=1 width=0)
Index Cond: ((name)::text = '123'::text)
-> Bitmap Index Scan on my_test_salary_idx (cost=0.00..4.30 rows=1 width=0)
Index Cond: (salary = 123.0)
(9 rows)If an OR clause involves multiple tables, the optimizer can only treat the clause as a filter condition after the join. This approach can reduce SQL execution efficiency. For example:
EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE (t1.num = 1 OR t2.cnt = 2);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Hash Join (cost=299.00..660.50 rows=110 width=51) (actual time=5.992..15.673 rows=110 loops=1)
Hash Cond: (t1.id = t2.id)
Join Filter: ((t1.num = 1) OR (t2.cnt = 2))
Rows Removed by Join Filter: 9890
-> Seq Scan on t1 (cost=0.00..174.00 rows=10000 width=25) (actual time=0.012..2.080 rows=10000 loops=1)
-> Hash (cost=174.00..174.00 rows=10000 width=26) (actual time=5.855..5.857 rows=10000 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 704kB
-> Seq Scan on t2 (cost=0.00..174.00 rows=10000 width=26) (actual time=0.007..1.779 rows=10000 loops=1)
Planning Time: 1.237 ms
Execution Time: 15.836 ms
(10 rows)The optimizer treats the preceding OR clause as a single unit and cannot use the indexes on t1.num or t2.cnt. This results in a full table scan of t1 and t2. In practice, an OR clause can be logically converted into a UNION ALL query with two or more branches. For example, the preceding statement can be rewritten as follows:
EXPLAIN ANALYZE
SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE t1.num = 1
UNION ALL
SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE t2.cnt = 2 AND (t1.num != 1 OR (t1.num = 1) IS NULL);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Append (cost=85.48..412.26 rows=110 width=51) (actual time=0.350..4.832 rows=110 loops=1)
-> Hash Join (cost=85.48..297.98 rows=100 width=51) (actual time=0.349..4.653 rows=100 loops=1)
Hash Cond: (t2.id = t1.id)
-> Seq Scan on t2 (cost=0.00..174.00 rows=10000 width=26) (actual time=0.009..1.719 rows=10000 loops=1)
-> Hash (cost=84.23..84.23 rows=100 width=25) (actual time=0.318..0.320 rows=100 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 14kB
-> Bitmap Heap Scan on t1 (cost=5.06..84.23 rows=100 width=25) (actual time=0.065..0.265 rows=100 loops=1)
Recheck Cond: (num = 1)
Heap Blocks: exact=73
-> Bitmap Index Scan on t1_num_idx (cost=0.00..5.04 rows=100 width=0) (actual time=0.037..0.037 rows=100 loops=1)
Index Cond: (num = 1)
-> Nested Loop (cost=4.65..112.63 rows=10 width=51) (actual time=0.049..0.159 rows=10 loops=1)
-> Bitmap Heap Scan on t2 t2_1 (cost=4.36..33.46 rows=10 width=26) (actual time=0.026..0.045 rows=10 loops=1)
Recheck Cond: (cnt = 2)
Heap Blocks: exact=10
-> Bitmap Index Scan on t2_cnt_idx (cost=0.00..4.36 rows=10 width=0) (actual time=0.017..0.018 rows=10 loops=1)
Index Cond: (cnt = 2)
-> Index Scan using t1_id_idx on t1 t1_1 (cost=0.29..7.91 rows=1 width=25) (actual time=0.009..0.009 rows=1 loops=10)
Index Cond: (id = t2_1.id)
Filter: ((num <> 1) OR ((num = 1) IS NULL))
Planning Time: 1.150 ms
Execution Time: 5.014 ms
(22 rows)After the statement is rewritten, the optimizer can use the indexes on t1.num and t2.cnt. This change reduces the amount of data processed and improves execution performance.
PolarDB for PostgreSQL and can attempt to convert suitable OR clauses into the UNION ALL format during execution plan generation. The optimizer then selects a path based on cost to obtain a more optimal execution plan.
Applicability
The following versions of PolarDB for PostgreSQL are supported: PostgreSQL 14 with minor engine version 2.0.14.13.27.0 or later.
You can view the minor engine version in the console or run the SHOW polardb_version; statement. If the minor engine version does not meet the requirement, you must upgrade the minor engine version.
Usage
The OR-to-UNION-ALL conversion feature is controlled by parameters. The following table describes the parameters and their functions.
Parameter Name | Description |
polar_cbqt_cost_threshold | Controls the total cost threshold of an execution plan for attempting an OR-to-UNION-ALL conversion. If the total cost of the original execution plan for a SQL statement does not exceed this threshold, the conversion is not attempted. The value ranges from A value of 0 means the conversion is attempted for all SQL statements. We do not recommend setting this parameter to 0. It may increase the planning time for all SQL statements and affect performance. |
polar_cbqt_convert_or_to_union_all_mode | Controls the OR-to-UNION-ALL conversion feature. Valid values:
|
Usage notes
Setting
polar_cbqt_cost_thresholdto 0 or settingpolar_cbqt_convert_or_to_union_all_modeto FORCE forces this optimization for all SQL statements. To force the OR-to-UNION-ALL optimization for a specific SQL statement, we recommend that you use a HINT. For more information, see Use HINTs.If the conditions in an OR clause involve only one table, the optimizer does not attempt the conversion.
In FORCE mode, if a SQL statement has multiple suitable OR clauses, the optimizer chooses the path with the lowest total cost.
If an OR clause has more than six arguments, the optimizer does not attempt the conversion.
If a SQL statement has more than six OR clauses, the optimizer considers only the first six clauses for conversion.
Examples
Prepare data
CREATE TABLE t1(id int, num int, dsc text, log_date text);
CREATE TABLE t2(id int, cnt int, change text, op_date text);
INSERT INTO t1 SELECT i, i%100, 'test'||1, to_char('1990-10-10'::date + i, 'YYYY-MM-DD') FROM generate_series(1,10000)i;
INSERT INTO t2 SELECT i, i%1000, 'now'||i, to_char('1990-10-10'::date + i, 'YYYY-MM-DD') FROM generate_series(1,10000)i;
CREATE INDEX ON t1(id);
CREATE INDEX ON t1(num);
CREATE INDEX ON t2(id);
CREATE INDEX ON t2(cnt);
ANALYZE t1;
ANALYZE t2;Basic feature
Disable the OR-to-UNION-ALL conversion feature.
SET polar_cbqt_cost_threshold to 100; SET polar_cbqt_convert_or_to_union_all_mode to off; EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE (t1.num = 1 OR t2.cnt = 2);The following result is returned:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Hash Join (cost=299.00..660.50 rows=110 width=51) (actual time=5.992..15.673 rows=110 loops=1) Hash Cond: (t1.id = t2.id) Join Filter: ((t1.num = 1) OR (t2.cnt = 2)) Rows Removed by Join Filter: 9890 -> Seq Scan on t1 (cost=0.00..174.00 rows=10000 width=25) (actual time=0.012..2.080 rows=10000 loops=1) -> Hash (cost=174.00..174.00 rows=10000 width=26) (actual time=5.855..5.857 rows=10000 loops=1) Buckets: 16384 Batches: 1 Memory Usage: 704kB -> Seq Scan on t2 (cost=0.00..174.00 rows=10000 width=26) (actual time=0.007..1.779 rows=10000 loops=1) Planning Time: 1.237 ms Execution Time: 15.836 ms (10 rows)Enable the OR-to-UNION-ALL conversion feature.
SET polar_cbqt_cost_threshold to 100; SET polar_cbqt_convert_or_to_union_all_mode to on; EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE (t1.num = 1 OR t2.cnt = 2);The following result is returned:
QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Append (cost=85.48..411.16 rows=110 width=51) (actual time=0.396..4.822 rows=110 loops=1) -> Hash Join (cost=85.48..297.98 rows=100 width=51) (actual time=0.395..4.639 rows=100 loops=1) Hash Cond: (t2.id = t1.id) -> Seq Scan on t2 (cost=0.00..174.00 rows=10000 width=26) (actual time=0.010..1.750 rows=10000 loops=1) -> Hash (cost=84.23..84.23 rows=100 width=25) (actual time=0.333..0.335 rows=100 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 14kB -> Bitmap Heap Scan on t1 (cost=5.06..84.23 rows=100 width=25) (actual time=0.056..0.247 rows=100 loops=1) Recheck Cond: (num = 1) Heap Blocks: exact=73 -> Bitmap Index Scan on t1_num_idx (cost=0.00..5.04 rows=100 width=0) (actual time=0.028..0.028 rows=100 loops=1) Index Cond: (num = 1) -> Nested Loop (cost=4.65..112.63 rows=10 width=51) (actual time=0.049..0.164 rows=10 loops=1) -> Bitmap Heap Scan on t2 (cost=4.36..33.46 rows=10 width=26) (actual time=0.027..0.044 rows=10 loops=1) Recheck Cond: (cnt = 2) Heap Blocks: exact=10 -> Bitmap Index Scan on t2_cnt_idx (cost=0.00..4.36 rows=10 width=0) (actual time=0.019..0.019 rows=10 loops=1) Index Cond: (cnt = 2) -> Index Scan using t1_id_idx on t1 (cost=0.29..7.91 rows=1 width=25) (actual time=0.010..0.010 rows=1 loops=10) Index Cond: (id = t2.id) Filter: ((num <> 1) OR ((num = 1) IS NULL)) Planning Time: 2.903 ms Execution Time: 4.980 ms (22 rows)
After you enable the OR-to-UNION-ALL conversion feature, the optimizer can generate an execution plan that uses the indexes on t1.num and t2.cnt. This achieves the same effect as manually rewriting the query to use UNION ALL.
Force selection
Enable the OR-to-UNION-ALL conversion feature.
SET polar_cbqt_cost_threshold to 100; SET polar_cbqt_convert_or_to_union_all_mode to on; EXPLAIN ANALYZE SELECT * FROM t1, t2 WHERE t1.dsc= t2.change AND (t1.log_date = '2024-01-01' OR t2.op_date = '2024-01-01');The following result is returned:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Hash Join (cost=299.00..660.50 rows=2 width=51) (actual time=14.321..14.325 rows=0 loops=1) Hash Cond: (t1.dsc = t2.change) Join Filter: ((t1.log_date = '2024-01-01'::text) OR (t2.op_date = '2024-01-01'::text)) -> Seq Scan on t1 (cost=0.00..174.00 rows=10000 width=25) (actual time=0.016..3.204 rows=10000 loops=1) -> Hash (cost=174.00..174.00 rows=10000 width=26) (actual time=6.506..6.508 rows=10000 loops=1) Buckets: 16384 Batches: 1 Memory Usage: 704kB -> Seq Scan on t2 (cost=0.00..174.00 rows=10000 width=26) (actual time=0.006..1.755 rows=10000 loops=1) Planning Time: 0.932 ms Execution Time: 14.571 ms (9 rows)Force the OR-to-UNION-ALL conversion.
SET polar_cbqt_cost_threshold to 100; SET polar_cbqt_convert_or_to_union_all_mode to force; EXPLAIN ANALYZE SELECT * FROM t1, t2 WHERE t1.dsc= t2.change AND (t1.log_date = '2024-01-01' OR t2.op_date = '2024-01-01');The following result is returned:
QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Append (cost=199.01..871.05 rows=2 width=51) (actual time=9.915..9.923 rows=0 loops=1) -> Hash Join (cost=199.01..410.52 rows=1 width=51) (actual time=5.046..5.050 rows=0 loops=1) Hash Cond: (t2.change = t1.dsc) -> Seq Scan on t2 (cost=0.00..174.00 rows=10000 width=26) (actual time=0.015..0.015 rows=1 loops=1) -> Hash (cost=199.00..199.00 rows=1 width=25) (actual time=5.014..5.016 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Seq Scan on t1 (cost=0.00..199.00 rows=1 width=25) (actual time=5.013..5.013 rows=0 loops=1) Filter: (log_date = '2024-01-01'::text) Rows Removed by Filter: 10000 -> Hash Join (cost=199.01..460.52 rows=1 width=51) (actual time=4.865..4.867 rows=0 loops=1) Hash Cond: (t1.dsc = t2.change) -> Seq Scan on t1 (cost=0.00..224.00 rows=9999 width=25) (actual time=0.015..0.016 rows=1 loops=1) Filter: ((log_date <> '2024-01-01'::text) OR ((log_date = '2024-01-01'::text) IS NULL)) -> Hash (cost=199.00..199.00 rows=1 width=26) (actual time=4.828..4.829 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Seq Scan on t2 (cost=0.00..199.00 rows=1 width=26) (actual time=4.827..4.827 rows=0 loops=1) Filter: (op_date = '2024-01-01'::text) Rows Removed by Filter: 10000 Planning Time: 0.777 ms Execution Time: 10.088 ms (20 rows)
A comparison of the total costs of the execution plans shows that the cost is lower without the conversion. In non-forced mode, the optimizer attempts the conversion based on the cost threshold but does not always choose the converted path. In FORCE mode, the optimizer is forced to choose the converted path. The FORCE mode is useful for scenarios in which the optimizer cannot accurately estimate costs and you need to force the query plan to use the OR-to-UNION-ALL conversion.
Use HINTs
To force the OR-to-UNION-ALL optimization for a specific SQL statement, you can also use HINTs for SQL-level control.
SET polar_cbqt_convert_or_to_union_all_mode to off;
EXPLAIN ANALYZE /*+ Set(polar_cbqt_convert_or_to_union_all_mode force) Set(polar_cbqt_cost_threshold 0) */ SELECT * FROM t1, t2 WHERE t1.dsc= t2.change and (t1.log_date = '2024-01-01' or t2.op_date = '2024-01-01');The following result is returned:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Append (cost=199.01..871.05 rows=2 width=51) (actual time=9.684..9.691 rows=0 loops=1)
-> Hash Join (cost=199.01..410.52 rows=1 width=51) (actual time=4.711..4.714 rows=0 loops=1)
Hash Cond: (t2.change = t1.dsc)
-> Seq Scan on t2 (cost=0.00..174.00 rows=10000 width=26) (actual time=0.013..0.013 rows=1 loops=1)
-> Hash (cost=199.00..199.00 rows=1 width=25) (actual time=4.682..4.684 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Seq Scan on t1 (cost=0.00..199.00 rows=1 width=25) (actual time=4.681..4.681 rows=0 loops=1)
Filter: (log_date = '2024-01-01'::text)
Rows Removed by Filter: 10000
-> Hash Join (cost=199.01..460.52 rows=1 width=51) (actual time=4.969..4.970 rows=0 loops=1)
Hash Cond: (t1.dsc = t2.change)
-> Seq Scan on t1 (cost=0.00..224.00 rows=9999 width=25) (actual time=0.018..0.018 rows=1 loops=1)
Filter: ((log_date <> '2024-01-01'::text) OR ((log_date = '2024-01-01'::text) IS NULL))
-> Hash (cost=199.00..199.00 rows=1 width=26) (actual time=4.935..4.936 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Seq Scan on t2 (cost=0.00..199.00 rows=1 width=26) (actual time=4.934..4.934 rows=0 loops=1)
Filter: (op_date = '2024-01-01'::text)
Rows Removed by Filter: 10000
Planning Time: 0.798 ms
Execution Time: 9.858 ms
(20 rows)Conversion threshold
In the Basic feature example, the total cost of the original execution plan is 660.50. Set the conversion threshold to a value higher than 660.50 and run the same SQL query again:
SET polar_cbqt_cost_threshold to 1000; -- Set the threshold.
SET polar_cbqt_convert_or_to_union_all_mode to force; -- Force the selection of the converted path.
EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE (t1.num = 1 OR t2.cnt = 2);The following result is returned:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Hash Join (cost=299.00..660.50 rows=110 width=51) (actual time=6.374..15.802 rows=110 loops=1)
Hash Cond: (t1.id = t2.id)
Join Filter: ((t1.num = 1) OR (t2.cnt = 2))
Rows Removed by Join Filter: 9890
-> Seq Scan on t1 (cost=0.00..174.00 rows=10000 width=25) (actual time=0.011..2.038 rows=10000 loops=1)
-> Hash (cost=174.00..174.00 rows=10000 width=26) (actual time=6.266..6.268 rows=10000 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 704kB
-> Seq Scan on t2 (cost=0.00..174.00 rows=10000 width=26) (actual time=0.006..1.778 rows=10000 loops=1)
Planning Time: 0.663 ms
Execution Time: 16.036 ms
(10 rows)The result shows that the optimizer still chooses the original path, even in FORCE mode. This is because the total cost of the original plan does not exceed the threshold, so the optimizer does not attempt the OR clause conversion.