OR 句の UNION ALL への変換 - PolarDB - Alibaba Cloud ドキュメントセンター

PolarDB for PostgreSQL とは、OR 句を UNION ALL 形式に変換するクエリ最適化機能をサポートしています。実行計画の生成中に、オプティマイザーはこの変換を試み、コストに基づいてパスを選択し、より最適な実行計画を作成します。

背景情報

PostgreSQL のオプティマイザーは、SQL 文の OR フィルター条件を最適化する機能が限られています。OR 句が 1 つのテーブルのみに関与し、各フィルター条件に適切なインデックスがある場合、オプティマイザーは BitmapOr インデックスパスを生成します。例：

EXPLAIN SELECT * FROM my_test WHERE (id = 123 OR name = '123' OR salary = 123.0);
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Bitmap Heap Scan on my_test  (cost=12.90..24.33 rows=3 width=15)
   Recheck Cond: ((id = 123) OR ((name)::text = '123'::text) OR (salary = 123.0))
   ->  BitmapOr  (cost=12.90..12.90 rows=3 width=0)
         ->  Bitmap Index Scan on my_test_id_idx  (cost=0.00..4.30 rows=1 width=0)
               Index Cond: (id = 123)
         ->  Bitmap Index Scan on my_test_name_idx  (cost=0.00..4.30 rows=1 width=0)
               Index Cond: ((name)::text = '123'::text)
         ->  Bitmap Index Scan on my_test_salary_idx  (cost=0.00..4.30 rows=1 width=0)
               Index Cond: (salary = 123.0)
(9 rows)

OR 句が複数のテーブルに関与する場合、オプティマイザーは JOIN 後のフィルター条件として句を扱うことしかできません。このアプローチは、SQL の実行効率を低下させる可能性があります。例：

EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE (t1.num = 1 OR t2.cnt = 2);
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=299.00..660.50 rows=110 width=51) (actual time=5.992..15.673 rows=110 loops=1)
   Hash Cond: (t1.id = t2.id)
   Join Filter: ((t1.num = 1) OR (t2.cnt = 2))
   Rows Removed by Join Filter: 9890
   ->  Seq Scan on t1  (cost=0.00..174.00 rows=10000 width=25) (actual time=0.012..2.080 rows=10000 loops=1)
   ->  Hash  (cost=174.00..174.00 rows=10000 width=26) (actual time=5.855..5.857 rows=10000 loops=1)
         Buckets: 16384  Batches: 1  Memory Usage: 704kB
         ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=26) (actual time=0.007..1.779 rows=10000 loops=1)
 Planning Time: 1.237 ms
 Execution Time: 15.836 ms
(10 rows)

オプティマイザーは、先行する OR 句を単一のユニットとして扱い、t1.num または t2.cnt のインデックスを使用できません。これにより、t1 と t2 の全表スキャンが発生します。実際には、OR 句は論理的に 2 つ以上のブランチを持つ UNION ALL クエリに変換できます。たとえば、先行する文は次のように書き換えることができます：

EXPLAIN ANALYZE
SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE t1.num = 1
UNION ALL
SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE t2.cnt = 2 AND (t1.num != 1 OR (t1.num = 1) IS NULL);
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=85.48..412.26 rows=110 width=51) (actual time=0.350..4.832 rows=110 loops=1)
   ->  Hash Join  (cost=85.48..297.98 rows=100 width=51) (actual time=0.349..4.653 rows=100 loops=1)
         Hash Cond: (t2.id = t1.id)
         ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=26) (actual time=0.009..1.719 rows=10000 loops=1)
         ->  Hash  (cost=84.23..84.23 rows=100 width=25) (actual time=0.318..0.320 rows=100 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 14kB
               ->  Bitmap Heap Scan on t1  (cost=5.06..84.23 rows=100 width=25) (actual time=0.065..0.265 rows=100 loops=1)
                     Recheck Cond: (num = 1)
                     Heap Blocks: exact=73
                     ->  Bitmap Index Scan on t1_num_idx  (cost=0.00..5.04 rows=100 width=0) (actual time=0.037..0.037 rows=100 loops=1)
                           Index Cond: (num = 1)
   ->  Nested Loop  (cost=4.65..112.63 rows=10 width=51) (actual time=0.049..0.159 rows=10 loops=1)
         ->  Bitmap Heap Scan on t2 t2_1  (cost=4.36..33.46 rows=10 width=26) (actual time=0.026..0.045 rows=10 loops=1)
               Recheck Cond: (cnt = 2)
               Heap Blocks: exact=10
               ->  Bitmap Index Scan on t2_cnt_idx  (cost=0.00..4.36 rows=10 width=0) (actual time=0.017..0.018 rows=10 loops=1)
                     Index Cond: (cnt = 2)
         ->  Index Scan using t1_id_idx on t1 t1_1  (cost=0.29..7.91 rows=1 width=25) (actual time=0.009..0.009 rows=1 loops=10)
               Index Cond: (id = t2_1.id)
               Filter: ((num <> 1) OR ((num = 1) IS NULL))
 Planning Time: 1.150 ms
 Execution Time: 5.014 ms
(22 rows)

文が書き換えられた後、オプティマイザーは t1.num と t2.cnt のインデックスを使用できます。この変更により、処理されるデータ量が減少し、実行パフォーマンスが向上します。

PolarDB for PostgreSQL とは、実行計画の生成中に、適切な OR 句を UNION ALL 形式に変換しようと試みることができます。その後、オプティマイザーはコストに基づいてパスを選択し、より最適な実行計画を取得します。

適用範囲

PolarDB for PostgreSQL の以下のバージョンがサポートされています：PostgreSQL 14 (マイナーエンジンバージョン 2.0.14.13.27.0 以降)。

説明

コンソールでマイナーエンジンバージョンを確認するか、SHOW polardb_version; 文を実行して確認できます。マイナーエンジンバージョンが要件を満たさない場合は、マイナーエンジンバージョンをアップグレードする必要があります。

使用方法

OR から UNION ALL への変換機能は、パラメーターによってコントロールされます。次の表に、パラメーターとその機能を示します。

パラメーター名

説明

polar_cbqt_cost_threshold

実行計画の合計コストのしきい値をコントロールし、OR から UNION ALL への変換を試行するかどうかを決定します。SQL 文の元の実行計画の合計コストがこのしきい値を超えない場合、変換は試行されません。有効値は [0, +∞) です。デフォルト値は 50000 です。

値を 0 にすると、すべての SQL 文で変換が試行されます。このパラメーターを 0 に設定することは推奨しません。すべての SQL 文の計画時間が増加し、パフォーマンスに影響を与える可能性があります。

polar_cbqt_convert_or_to_union_all_mode

OR から UNION ALL への変換機能をコントロールします。有効な値は次のとおりです：

OFF (デフォルト)：機能を無効にします。
ON：OR 句を UNION ALL に変換します。
FORCE：機能を有効にし、変換されたパスが存在する場合、オプティマイザーにそのパスを強制的に選択させます。このパスは、合計コストが最も低いとは限りません。

注意事項

polar_cbqt_cost_threshold を 0 に設定するか、polar_cbqt_convert_or_to_union_all_mode を FORCE に設定すると、すべての SQL 文でこの最適化が強制されます。特定の SQL 文で OR から UNION ALL への最適化を強制するには、HINT を使用することを推奨します。詳細については、「HINT の使用」をご参照ください。
OR 句の条件が 1 つのテーブルのみに関与する場合、オプティマイザーは変換を試行しません。
FORCE モードで、SQL 文に複数の適切な OR 句がある場合、オプティマイザーは合計コストが最も低いパスを選択します。
OR 句に 6 つを超える引数がある場合、オプティマイザーは変換を試行しません。
SQL 文に 6 つを超える OR 句がある場合、オプティマイザーは最初の 6 つの句のみを変換の対象とします。

例

データの準備

CREATE TABLE t1(id int, num int, dsc text, log_date text);
CREATE TABLE t2(id int, cnt int, change text, op_date text);

INSERT INTO t1 SELECT i, i%100, 'test'||1, to_char('1990-10-10'::date + i, 'YYYY-MM-DD') FROM generate_series(1,10000)i;
INSERT INTO t2 SELECT i, i%1000, 'now'||i, to_char('1990-10-10'::date + i, 'YYYY-MM-DD') FROM generate_series(1,10000)i;

CREATE INDEX ON t1(id);
CREATE INDEX ON t1(num);
CREATE INDEX ON t2(id);
CREATE INDEX ON t2(cnt);

ANALYZE t1;
ANALYZE t2;

基本機能

OR から UNION ALL への変換機能を無効にします。

SET polar_cbqt_cost_threshold to 100;
SET polar_cbqt_convert_or_to_union_all_mode to off;

EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE (t1.num = 1 OR t2.cnt = 2);

次の結果が返されます：

                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=299.00..660.50 rows=110 width=51) (actual time=5.992..15.673 rows=110 loops=1)
   Hash Cond: (t1.id = t2.id)
   Join Filter: ((t1.num = 1) OR (t2.cnt = 2))
   Rows Removed by Join Filter: 9890
   ->  Seq Scan on t1  (cost=0.00..174.00 rows=10000 width=25) (actual time=0.012..2.080 rows=10000 loops=1)
   ->  Hash  (cost=174.00..174.00 rows=10000 width=26) (actual time=5.855..5.857 rows=10000 loops=1)
         Buckets: 16384  Batches: 1  Memory Usage: 704kB
         ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=26) (actual time=0.007..1.779 rows=10000 loops=1)
 Planning Time: 1.237 ms
 Execution Time: 15.836 ms
(10 rows)

OR から UNION ALL への変換機能を有効にします。

SET polar_cbqt_cost_threshold to 100;
SET polar_cbqt_convert_or_to_union_all_mode to on;

EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE (t1.num = 1 OR t2.cnt = 2);

次の結果が返されます：

                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=85.48..411.16 rows=110 width=51) (actual time=0.396..4.822 rows=110 loops=1)
   ->  Hash Join  (cost=85.48..297.98 rows=100 width=51) (actual time=0.395..4.639 rows=100 loops=1)
         Hash Cond: (t2.id = t1.id)
         ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=26) (actual time=0.010..1.750 rows=10000 loops=1)
         ->  Hash  (cost=84.23..84.23 rows=100 width=25) (actual time=0.333..0.335 rows=100 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 14kB
               ->  Bitmap Heap Scan on t1  (cost=5.06..84.23 rows=100 width=25) (actual time=0.056..0.247 rows=100 loops=1)
                     Recheck Cond: (num = 1)
                     Heap Blocks: exact=73
                     ->  Bitmap Index Scan on t1_num_idx  (cost=0.00..5.04 rows=100 width=0) (actual time=0.028..0.028 rows=100 loops=1)
                           Index Cond: (num = 1)
   ->  Nested Loop  (cost=4.65..112.63 rows=10 width=51) (actual time=0.049..0.164 rows=10 loops=1)
         ->  Bitmap Heap Scan on t2  (cost=4.36..33.46 rows=10 width=26) (actual time=0.027..0.044 rows=10 loops=1)
               Recheck Cond: (cnt = 2)
               Heap Blocks: exact=10
               ->  Bitmap Index Scan on t2_cnt_idx  (cost=0.00..4.36 rows=10 width=0) (actual time=0.019..0.019 rows=10 loops=1)
                     Index Cond: (cnt = 2)
         ->  Index Scan using t1_id_idx on t1  (cost=0.29..7.91 rows=1 width=25) (actual time=0.010..0.010 rows=1 loops=10)
               Index Cond: (id = t2.id)
               Filter: ((num <> 1) OR ((num = 1) IS NULL))
 Planning Time: 2.903 ms
 Execution Time: 4.980 ms
(22 rows)

OR から UNION ALL への変換機能を有効にすると、オプティマイザーは t1.num と t2.cnt のインデックスを使用する実行計画を生成できます。これにより、クエリを手動で書き換えて UNION ALL を使用するのと同じ効果が得られます。

強制選択

OR から UNION ALL への変換機能を有効にします。

SET polar_cbqt_cost_threshold to 100;
SET polar_cbqt_convert_or_to_union_all_mode to on;

EXPLAIN ANALYZE SELECT * FROM t1, t2 WHERE t1.dsc= t2.change AND (t1.log_date = '2024-01-01' OR t2.op_date = '2024-01-01');

次の結果が返されます：

                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=299.00..660.50 rows=2 width=51) (actual time=14.321..14.325 rows=0 loops=1)
   Hash Cond: (t1.dsc = t2.change)
   Join Filter: ((t1.log_date = '2024-01-01'::text) OR (t2.op_date = '2024-01-01'::text))
   ->  Seq Scan on t1  (cost=0.00..174.00 rows=10000 width=25) (actual time=0.016..3.204 rows=10000 loops=1)
   ->  Hash  (cost=174.00..174.00 rows=10000 width=26) (actual time=6.506..6.508 rows=10000 loops=1)
         Buckets: 16384  Batches: 1  Memory Usage: 704kB
         ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=26) (actual time=0.006..1.755 rows=10000 loops=1)
 Planning Time: 0.932 ms
 Execution Time: 14.571 ms
(9 rows)

OR から UNION ALL への変換を強制します。

SET polar_cbqt_cost_threshold to 100;
SET polar_cbqt_convert_or_to_union_all_mode to force;

EXPLAIN ANALYZE SELECT * FROM t1, t2 WHERE t1.dsc= t2.change AND (t1.log_date = '2024-01-01' OR t2.op_date = '2024-01-01');

次の結果が返されます：

                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Append  (cost=199.01..871.05 rows=2 width=51) (actual time=9.915..9.923 rows=0 loops=1)
   ->  Hash Join  (cost=199.01..410.52 rows=1 width=51) (actual time=5.046..5.050 rows=0 loops=1)
         Hash Cond: (t2.change = t1.dsc)
         ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=26) (actual time=0.015..0.015 rows=1 loops=1)
         ->  Hash  (cost=199.00..199.00 rows=1 width=25) (actual time=5.014..5.016 rows=0 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 8kB
               ->  Seq Scan on t1  (cost=0.00..199.00 rows=1 width=25) (actual time=5.013..5.013 rows=0 loops=1)
                     Filter: (log_date = '2024-01-01'::text)
                     Rows Removed by Filter: 10000
   ->  Hash Join  (cost=199.01..460.52 rows=1 width=51) (actual time=4.865..4.867 rows=0 loops=1)
         Hash Cond: (t1.dsc = t2.change)
         ->  Seq Scan on t1  (cost=0.00..224.00 rows=9999 width=25) (actual time=0.015..0.016 rows=1 loops=1)
               Filter: ((log_date <> '2024-01-01'::text) OR ((log_date = '2024-01-01'::text) IS NULL))
         ->  Hash  (cost=199.00..199.00 rows=1 width=26) (actual time=4.828..4.829 rows=0 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 8kB
               ->  Seq Scan on t2  (cost=0.00..199.00 rows=1 width=26) (actual time=4.827..4.827 rows=0 loops=1)
                     Filter: (op_date = '2024-01-01'::text)
                     Rows Removed by Filter: 10000
 Planning Time: 0.777 ms
 Execution Time: 10.088 ms
(20 rows)

実行計画の合計コストを比較すると、変換しない方がコストが低いことがわかります。非強制モードでは、オプティマイザーはコストのしきい値に基づいて変換を試みますが、常に変換されたパスを選択するわけではありません。FORCE モードでは、オプティマイザーは強制的に変換されたパスを選択します。FORCE モードは、オプティマイザーがコストを正確に推定できず、クエリ計画に OR から UNION ALL への変換を強制する必要があるシナリオで役立ちます。

HINT の使用

特定の SQL 文で OR から UNION ALL への最適化を強制するには、HINT を使用して SQL レベルでコントロールすることもできます。

SET polar_cbqt_convert_or_to_union_all_mode to off;

EXPLAIN ANALYZE /*+ Set(polar_cbqt_convert_or_to_union_all_mode force) Set(polar_cbqt_cost_threshold 0) */ SELECT * FROM t1, t2 WHERE t1.dsc= t2.change and (t1.log_date = '2024-01-01' or t2.op_date = '2024-01-01');

次の結果が返されます：

                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Append  (cost=199.01..871.05 rows=2 width=51) (actual time=9.684..9.691 rows=0 loops=1)
   ->  Hash Join  (cost=199.01..410.52 rows=1 width=51) (actual time=4.711..4.714 rows=0 loops=1)
         Hash Cond: (t2.change = t1.dsc)
         ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=26) (actual time=0.013..0.013 rows=1 loops=1)
         ->  Hash  (cost=199.00..199.00 rows=1 width=25) (actual time=4.682..4.684 rows=0 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 8kB
               ->  Seq Scan on t1  (cost=0.00..199.00 rows=1 width=25) (actual time=4.681..4.681 rows=0 loops=1)
                     Filter: (log_date = '2024-01-01'::text)
                     Rows Removed by Filter: 10000
   ->  Hash Join  (cost=199.01..460.52 rows=1 width=51) (actual time=4.969..4.970 rows=0 loops=1)
         Hash Cond: (t1.dsc = t2.change)
         ->  Seq Scan on t1  (cost=0.00..224.00 rows=9999 width=25) (actual time=0.018..0.018 rows=1 loops=1)
               Filter: ((log_date <> '2024-01-01'::text) OR ((log_date = '2024-01-01'::text) IS NULL))
         ->  Hash  (cost=199.00..199.00 rows=1 width=26) (actual time=4.935..4.936 rows=0 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 8kB
               ->  Seq Scan on t2  (cost=0.00..199.00 rows=1 width=26) (actual time=4.934..4.934 rows=0 loops=1)
                     Filter: (op_date = '2024-01-01'::text)
                     Rows Removed by Filter: 10000
 Planning Time: 0.798 ms
 Execution Time: 9.858 ms
(20 rows)

変換のしきい値

「基本機能」の例では、元の実行計画の合計 cost は 660.50 です。変換のしきい値を 660.50 より高い値に設定し、同じ SQL クエリを再度実行します：

SET polar_cbqt_cost_threshold to 1000; -- しきい値を設定します。
SET polar_cbqt_convert_or_to_union_all_mode to force; -- 変換されたパスの選択を強制します。

EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE (t1.num = 1 OR t2.cnt = 2);

次の結果が返されます：

                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=299.00..660.50 rows=110 width=51) (actual time=6.374..15.802 rows=110 loops=1)
   Hash Cond: (t1.id = t2.id)
   Join Filter: ((t1.num = 1) OR (t2.cnt = 2))
   Rows Removed by Join Filter: 9890
   ->  Seq Scan on t1  (cost=0.00..174.00 rows=10000 width=25) (actual time=0.011..2.038 rows=10000 loops=1)
   ->  Hash  (cost=174.00..174.00 rows=10000 width=26) (actual time=6.266..6.268 rows=10000 loops=1)
         Buckets: 16384  Batches: 1  Memory Usage: 704kB
         ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=26) (actual time=0.006..1.778 rows=10000 loops=1)
 Planning Time: 0.663 ms
 Execution Time: 16.036 ms
(10 rows)

この結果は、FORCE モードであっても、オプティマイザーが依然として元のパスを選択することを示しています。これは、元の計画の合計コストがしきい値を超えていないため、オプティマイザーが OR 句の変換を試行しないためです。