全部產品
Search
文件中心

PolarDB:OR子句轉UNION ALL

更新時間:Jan 14, 2026

PolarDB PostgreSQL版支援OR子句轉UNION ALL的查詢最佳化功能,能夠在產生計劃期間嘗試將合適的OR子句轉換成UNION ALL形式,再進行基於代價的直接選取,得到更優的執行計畫。

背景

目前,PostgreSQL最佳化器對SQL中的OR子句過濾條件的最佳化能力較為有限。如果OR子句中的過濾條件僅涉及一張表,且所有過濾條件上均具備適當的索引,則最佳化器會為此類情境產生一個BitmapOr的Index Path。例如:

EXPLAIN SELECT * FROM my_test WHERE (id = 123 OR name = '123' OR salary = 123.0);
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Bitmap Heap Scan on my_test  (cost=12.90..24.33 rows=3 width=15)
   Recheck Cond: ((id = 123) OR ((name)::text = '123'::text) OR (salary = 123.0))
   ->  BitmapOr  (cost=12.90..12.90 rows=3 width=0)
         ->  Bitmap Index Scan on my_test_id_idx  (cost=0.00..4.30 rows=1 width=0)
               Index Cond: (id = 123)
         ->  Bitmap Index Scan on my_test_name_idx  (cost=0.00..4.30 rows=1 width=0)
               Index Cond: ((name)::text = '123'::text)
         ->  Bitmap Index Scan on my_test_salary_idx  (cost=0.00..4.30 rows=1 width=0)
               Index Cond: (salary = 123.0)
(9 rows)

如果OR子句涉及多張表,最佳化器只能將該OR子句視為串連後的過濾條件,這可能導致SQL執行效率降低。例如:

EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE (t1.num = 1 OR t2.cnt = 2);
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=299.00..660.50 rows=110 width=51) (actual time=5.992..15.673 rows=110 loops=1)
   Hash Cond: (t1.id = t2.id)
   Join Filter: ((t1.num = 1) OR (t2.cnt = 2))
   Rows Removed by Join Filter: 9890
   ->  Seq Scan on t1  (cost=0.00..174.00 rows=10000 width=25) (actual time=0.012..2.080 rows=10000 loops=1)
   ->  Hash  (cost=174.00..174.00 rows=10000 width=26) (actual time=5.855..5.857 rows=10000 loops=1)
         Buckets: 16384  Batches: 1  Memory Usage: 704kB
         ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=26) (actual time=0.007..1.779 rows=10000 loops=1)
 Planning Time: 1.237 ms
 Execution Time: 15.836 ms
(10 rows)

上述OR子句被當作一個整體,最佳化器無法使用t1.num或者t2.cnt上的索引,導致t1t2進行全表掃描。實際上,OR子句在邏輯上可以轉化為UNION ALL,包含兩個或多個分支的查詢形式。例如,上述樣本可以改寫為:

EXPLAIN ANALYZE
SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE t1.num = 1
UNION ALL
SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE t2.cnt = 2 AND (t1.num != 1 OR (t1.num = 1) IS NULL);
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=85.48..412.26 rows=110 width=51) (actual time=0.350..4.832 rows=110 loops=1)
   ->  Hash Join  (cost=85.48..297.98 rows=100 width=51) (actual time=0.349..4.653 rows=100 loops=1)
         Hash Cond: (t2.id = t1.id)
         ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=26) (actual time=0.009..1.719 rows=10000 loops=1)
         ->  Hash  (cost=84.23..84.23 rows=100 width=25) (actual time=0.318..0.320 rows=100 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 14kB
               ->  Bitmap Heap Scan on t1  (cost=5.06..84.23 rows=100 width=25) (actual time=0.065..0.265 rows=100 loops=1)
                     Recheck Cond: (num = 1)
                     Heap Blocks: exact=73
                     ->  Bitmap Index Scan on t1_num_idx  (cost=0.00..5.04 rows=100 width=0) (actual time=0.037..0.037 rows=100 loops=1)
                           Index Cond: (num = 1)
   ->  Nested Loop  (cost=4.65..112.63 rows=10 width=51) (actual time=0.049..0.159 rows=10 loops=1)
         ->  Bitmap Heap Scan on t2 t2_1  (cost=4.36..33.46 rows=10 width=26) (actual time=0.026..0.045 rows=10 loops=1)
               Recheck Cond: (cnt = 2)
               Heap Blocks: exact=10
               ->  Bitmap Index Scan on t2_cnt_idx  (cost=0.00..4.36 rows=10 width=0) (actual time=0.017..0.018 rows=10 loops=1)
                     Index Cond: (cnt = 2)
         ->  Index Scan using t1_id_idx on t1 t1_1  (cost=0.29..7.91 rows=1 width=25) (actual time=0.009..0.009 rows=1 loops=10)
               Index Cond: (id = t2_1.id)
               Filter: ((num <> 1) OR ((num = 1) IS NULL))
 Planning Time: 1.150 ms
 Execution Time: 5.014 ms
(22 rows)

改寫後最佳化器利用t1.numt2.cnt上的索引,減少中間處理的資料量,從而提高執行效能。

PolarDB PostgreSQL版在產生計劃期間,能夠嘗試將適當的OR子句轉換為UNION ALL形式,隨後基於代價進行直接選取,從而最終獲得更優的執行計畫。

適用範圍

支援的PolarDB PostgreSQL版的版本如下: PostgreSQL 14,且核心小版本需為2.0.14.13.27.0及以上。

說明

您可在控制台查看核心小版本號碼,也可以通過SHOW polardb_version;語句查看。如未滿足核心小版本要求,請升級核心小版本

使用說明

OR子句轉換為UNION ALL的能力受參數控制,相關的參數名稱及其作用如下所示:

參數名稱

描述

polar_cbqt_cost_threshold

用於控制對SQL嘗試進行OR轉換的執行計畫總代價閾值,如果 SQL 原始的執行計畫總代價沒有超過閾值,則不會進行OR轉換。取值範圍:[0, +∞),預設值為50000。

設定參數為0時,表示對任何SQL都嘗試進行OR轉換。不建議設定為0,可能會讓所有SQL計劃時間變長,影響效能。

polar_cbqt_convert_or_to_union_all_mode

用於控制OR子句轉換UNION ALL功能的開關,取值如下:

  • OFF(預設):關閉OR子句轉UNION ALL功能。

  • ON:開啟OR子句轉UNION ALL功能。

  • FORCE:開啟OR子句轉UNION ALL功能,如果有轉換後的路徑,則強制選擇轉換後的路徑(可能並不是總代價最低的路徑)。

注意事項

  • 設定polar_cbqt_cost_threshold為0,或者設定polar_cbqt_convert_or_to_union_all_mode為FORCE,將會強制SQL運用最佳化。對於指定SQL強制OR子句轉UNION ALL最佳化,建議使用HINT來設定,詳情請參考使用HINT

  • 如果OR子句中的條件僅涉及一張表,最佳化器將不會嘗試對OR子句進行轉換。

  • 在強制模式下,如果SQL中存在多個合適的OR子句,將選擇總代價最小的路徑。

  • 如果一個OR子句的參數超過6個,最佳化器將不會對該OR子句進行轉換。

  • 如果一個SQL的OR子句數量超過6個,將只對前6個OR子句進行轉換選擇。

樣本

資料準備

CREATE TABLE t1(id int, num int, dsc text, log_date text);
CREATE TABLE t2(id int, cnt int, change text, op_date text);

INSERT INTO t1 SELECT i, i%100, 'test'||1, to_char('1990-10-10'::date + i, 'YYYY-MM-DD') FROM generate_series(1,10000)i;
INSERT INTO t2 SELECT i, i%1000, 'now'||i, to_char('1990-10-10'::date + i, 'YYYY-MM-DD') FROM generate_series(1,10000)i;

CREATE INDEX ON t1(id);
CREATE INDEX ON t1(num);
CREATE INDEX ON t2(id);
CREATE INDEX ON t2(cnt);

ANALYZE t1;
ANALYZE t2;

準系統

  • 關閉OR子句轉換為UNION ALL功能

    SET polar_cbqt_cost_threshold to 100;
    SET polar_cbqt_convert_or_to_union_all_mode to off;
    
    EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE (t1.num = 1 OR t2.cnt = 2);

    返回結果如下:

                                                        QUERY PLAN
    -------------------------------------------------------------------------------------------------------------------
     Hash Join  (cost=299.00..660.50 rows=110 width=51) (actual time=5.992..15.673 rows=110 loops=1)
       Hash Cond: (t1.id = t2.id)
       Join Filter: ((t1.num = 1) OR (t2.cnt = 2))
       Rows Removed by Join Filter: 9890
       ->  Seq Scan on t1  (cost=0.00..174.00 rows=10000 width=25) (actual time=0.012..2.080 rows=10000 loops=1)
       ->  Hash  (cost=174.00..174.00 rows=10000 width=26) (actual time=5.855..5.857 rows=10000 loops=1)
             Buckets: 16384  Batches: 1  Memory Usage: 704kB
             ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=26) (actual time=0.007..1.779 rows=10000 loops=1)
     Planning Time: 1.237 ms
     Execution Time: 15.836 ms
    (10 rows)
  • 開啟OR子句轉換為UNION ALL功能

    SET polar_cbqt_cost_threshold to 100;
    SET polar_cbqt_convert_or_to_union_all_mode to on;
    
    EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE (t1.num = 1 OR t2.cnt = 2);

    返回結果如下:

                                                                   QUERY PLAN
    -----------------------------------------------------------------------------------------------------------------------------------------
     Append  (cost=85.48..411.16 rows=110 width=51) (actual time=0.396..4.822 rows=110 loops=1)
       ->  Hash Join  (cost=85.48..297.98 rows=100 width=51) (actual time=0.395..4.639 rows=100 loops=1)
             Hash Cond: (t2.id = t1.id)
             ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=26) (actual time=0.010..1.750 rows=10000 loops=1)
             ->  Hash  (cost=84.23..84.23 rows=100 width=25) (actual time=0.333..0.335 rows=100 loops=1)
                   Buckets: 1024  Batches: 1  Memory Usage: 14kB
                   ->  Bitmap Heap Scan on t1  (cost=5.06..84.23 rows=100 width=25) (actual time=0.056..0.247 rows=100 loops=1)
                         Recheck Cond: (num = 1)
                         Heap Blocks: exact=73
                         ->  Bitmap Index Scan on t1_num_idx  (cost=0.00..5.04 rows=100 width=0) (actual time=0.028..0.028 rows=100 loops=1)
                               Index Cond: (num = 1)
       ->  Nested Loop  (cost=4.65..112.63 rows=10 width=51) (actual time=0.049..0.164 rows=10 loops=1)
             ->  Bitmap Heap Scan on t2  (cost=4.36..33.46 rows=10 width=26) (actual time=0.027..0.044 rows=10 loops=1)
                   Recheck Cond: (cnt = 2)
                   Heap Blocks: exact=10
                   ->  Bitmap Index Scan on t2_cnt_idx  (cost=0.00..4.36 rows=10 width=0) (actual time=0.019..0.019 rows=10 loops=1)
                         Index Cond: (cnt = 2)
             ->  Index Scan using t1_id_idx on t1  (cost=0.29..7.91 rows=1 width=25) (actual time=0.010..0.010 rows=1 loops=10)
                   Index Cond: (id = t2.id)
                   Filter: ((num <> 1) OR ((num = 1) IS NULL))
     Planning Time: 2.903 ms
     Execution Time: 4.980 ms
    (22 rows)

綜上,開啟OR子句轉UNION ALL功能後,執行計畫中可以利用t1.numt2.cnt上的索引,達到和手動改寫成UNION ALL相同的效果。

強制選擇

  • 開啟OR子句轉換為UNION ALL功能。

    SET polar_cbqt_cost_threshold to 100;
    SET polar_cbqt_convert_or_to_union_all_mode to on;
    
    EXPLAIN ANALYZE SELECT * FROM t1, t2 WHERE t1.dsc= t2.change AND (t1.log_date = '2024-01-01' OR t2.op_date = '2024-01-01');

    返回結果如下:

                                                        QUERY PLAN
    -------------------------------------------------------------------------------------------------------------------
     Hash Join  (cost=299.00..660.50 rows=2 width=51) (actual time=14.321..14.325 rows=0 loops=1)
       Hash Cond: (t1.dsc = t2.change)
       Join Filter: ((t1.log_date = '2024-01-01'::text) OR (t2.op_date = '2024-01-01'::text))
       ->  Seq Scan on t1  (cost=0.00..174.00 rows=10000 width=25) (actual time=0.016..3.204 rows=10000 loops=1)
       ->  Hash  (cost=174.00..174.00 rows=10000 width=26) (actual time=6.506..6.508 rows=10000 loops=1)
             Buckets: 16384  Batches: 1  Memory Usage: 704kB
             ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=26) (actual time=0.006..1.755 rows=10000 loops=1)
     Planning Time: 0.932 ms
     Execution Time: 14.571 ms
    (9 rows)
  • 強制選擇OR子句轉換為UNION ALL功能。

    SET polar_cbqt_cost_threshold to 100;
    SET polar_cbqt_convert_or_to_union_all_mode to force;
    
    EXPLAIN ANALYZE SELECT * FROM t1, t2 WHERE t1.dsc= t2.change AND (t1.log_date = '2024-01-01' OR t2.op_date = '2024-01-01');

    返回結果如下:

                                                       QUERY PLAN
    -----------------------------------------------------------------------------------------------------------------
     Append  (cost=199.01..871.05 rows=2 width=51) (actual time=9.915..9.923 rows=0 loops=1)
       ->  Hash Join  (cost=199.01..410.52 rows=1 width=51) (actual time=5.046..5.050 rows=0 loops=1)
             Hash Cond: (t2.change = t1.dsc)
             ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=26) (actual time=0.015..0.015 rows=1 loops=1)
             ->  Hash  (cost=199.00..199.00 rows=1 width=25) (actual time=5.014..5.016 rows=0 loops=1)
                   Buckets: 1024  Batches: 1  Memory Usage: 8kB
                   ->  Seq Scan on t1  (cost=0.00..199.00 rows=1 width=25) (actual time=5.013..5.013 rows=0 loops=1)
                         Filter: (log_date = '2024-01-01'::text)
                         Rows Removed by Filter: 10000
       ->  Hash Join  (cost=199.01..460.52 rows=1 width=51) (actual time=4.865..4.867 rows=0 loops=1)
             Hash Cond: (t1.dsc = t2.change)
             ->  Seq Scan on t1  (cost=0.00..224.00 rows=9999 width=25) (actual time=0.015..0.016 rows=1 loops=1)
                   Filter: ((log_date <> '2024-01-01'::text) OR ((log_date = '2024-01-01'::text) IS NULL))
             ->  Hash  (cost=199.00..199.00 rows=1 width=26) (actual time=4.828..4.829 rows=0 loops=1)
                   Buckets: 1024  Batches: 1  Memory Usage: 8kB
                   ->  Seq Scan on t2  (cost=0.00..199.00 rows=1 width=26) (actual time=4.827..4.827 rows=0 loops=1)
                         Filter: (op_date = '2024-01-01'::text)
                         Rows Removed by Filter: 10000
     Planning Time: 0.777 ms
     Execution Time: 10.088 ms
    (20 rows)

綜上,對比轉換前後的執行計畫總代價,可發現不進行OR子句轉換的總代價更低。在非強制模式下,將按執行計畫總代價閾值設定進行OR子句轉換嘗試,最佳化器不一定會選擇經過OR子句轉換後的路徑。然而,在強制模式下,最佳化器則會強制選擇經過OR子句轉換後的路徑。強制模式適用於最佳化器無法準確估算代價,且需要固定查詢選擇OR轉換為UNION ALL的情境。

使用HINT

對於指定SQL強制OR子句轉UNION ALL最佳化,也可以使用HINT功能進行SQL層級的功能控制。

SET polar_cbqt_convert_or_to_union_all_mode to off;

EXPLAIN ANALYZE /*+ Set(polar_cbqt_convert_or_to_union_all_mode force) Set(polar_cbqt_cost_threshold 0) */ SELECT * FROM t1, t2 WHERE t1.dsc= t2.change and (t1.log_date = '2024-01-01' or t2.op_date = '2024-01-01');

返回結果如下:

                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Append  (cost=199.01..871.05 rows=2 width=51) (actual time=9.684..9.691 rows=0 loops=1)
   ->  Hash Join  (cost=199.01..410.52 rows=1 width=51) (actual time=4.711..4.714 rows=0 loops=1)
         Hash Cond: (t2.change = t1.dsc)
         ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=26) (actual time=0.013..0.013 rows=1 loops=1)
         ->  Hash  (cost=199.00..199.00 rows=1 width=25) (actual time=4.682..4.684 rows=0 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 8kB
               ->  Seq Scan on t1  (cost=0.00..199.00 rows=1 width=25) (actual time=4.681..4.681 rows=0 loops=1)
                     Filter: (log_date = '2024-01-01'::text)
                     Rows Removed by Filter: 10000
   ->  Hash Join  (cost=199.01..460.52 rows=1 width=51) (actual time=4.969..4.970 rows=0 loops=1)
         Hash Cond: (t1.dsc = t2.change)
         ->  Seq Scan on t1  (cost=0.00..224.00 rows=9999 width=25) (actual time=0.018..0.018 rows=1 loops=1)
               Filter: ((log_date <> '2024-01-01'::text) OR ((log_date = '2024-01-01'::text) IS NULL))
         ->  Hash  (cost=199.00..199.00 rows=1 width=26) (actual time=4.935..4.936 rows=0 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 8kB
               ->  Seq Scan on t2  (cost=0.00..199.00 rows=1 width=26) (actual time=4.934..4.934 rows=0 loops=1)
                     Filter: (op_date = '2024-01-01'::text)
                     Rows Removed by Filter: 10000
 Planning Time: 0.798 ms
 Execution Time: 9.858 ms
(20 rows)

轉換閾值

準系統樣本中,原始SQL執行計畫的總cost值為660.50。因此,我們將轉換閾值設定為高於此值,並再次執行相同的SQL查詢:

SET polar_cbqt_cost_threshold to 1000; -- 設定閾值
SET polar_cbqt_convert_or_to_union_all_mode to force; -- 強制選擇轉換後的路徑

EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE (t1.num = 1 OR t2.cnt = 2);

返回結果如下:

                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=299.00..660.50 rows=110 width=51) (actual time=6.374..15.802 rows=110 loops=1)
   Hash Cond: (t1.id = t2.id)
   Join Filter: ((t1.num = 1) OR (t2.cnt = 2))
   Rows Removed by Join Filter: 9890
   ->  Seq Scan on t1  (cost=0.00..174.00 rows=10000 width=25) (actual time=0.011..2.038 rows=10000 loops=1)
   ->  Hash  (cost=174.00..174.00 rows=10000 width=26) (actual time=6.266..6.268 rows=10000 loops=1)
         Buckets: 16384  Batches: 1  Memory Usage: 704kB
         ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=26) (actual time=0.006..1.778 rows=10000 loops=1)
 Planning Time: 0.663 ms
 Execution Time: 16.036 ms
(10 rows)

可以看出,即使在強制模式下,最佳化器仍然選擇了原始路徑。這是由於原始計劃的總代價未超過閾值,因此不會嘗試進行OR子句的轉化。