プランの高速化 - PolarDB - Alibaba Cloud ドキュメントセンター

パーティションテーブルに数百〜数千のパーティションが存在する場合、標準的なクエリ計画処理は、同じクエリを非パーティション化テーブルで実行するよりも遅くなる可能性があります。PolarDB for PostgreSQL (Compatible with Oracle) では、この課題に対応するため PartitionedTableScan オペレーターを提供しています。このオペレーターは、標準的な Append ベースの計画方式を単一パスのメカニズムに置き換え、クエリ結果を変更することなく、計画時間とメモリ使用量を大幅に削減します。

仕組み

PostgreSQL のクエリオプテマイザーは、パーティションテーブルに対して各パーティションごとに実行計画を生成し、それらを Append ノードの下に結合します。パーティション数が数千程度であれば、この方式は十分に機能します。しかし、特に多階層のサブパーティション構成を採用している場合など、パーティション数が増加すると、計画時間およびメモリ使用量も比例して増大します。

例：2 階層ハッシュパーティションスキーム

PolarDB for PostgreSQL (Compatible with Oracle) では、パーティション数に制限はありません。第 1 階層に 100 個のパーティションを持ち、それぞれに 100 個のサブパーティションを持つテーブルの場合、合計で 10,000 個のパーティションが存在します。

CREATE TABLE part_hash (a int, b int, c varchar(10))
PARTITION BY HASH(a) SUBPARTITION BY HASH (b) PARTITIONS 100 SUBPARTITIONS 100;

単一テーブルクエリ：Append と PartitionedTableScan の比較

Append（標準）を使用した場合：

explain analyze select * from part_hash;
                                 QUERY PLAN
-----------------------------------------------------------------------------
 Append  (cost=0.00..344500.00 rows=16300000 width=22)
   ->  Seq Scan on part_hash_sys0102  (cost=0.00..26.30 rows=1630 width=22)
   ->  Seq Scan on part_hash_sys0103  (cost=0.00..26.30 rows=1630 width=22)
   ->  Seq Scan on part_hash_sys0104  (cost=0.00..26.30 rows=1630 width=22)
  ...
  ...
  ...
   ->  Seq Scan on part_hash_sys10198  (cost=0.00..26.30 rows=1630 width=22)
   ->  Seq Scan on part_hash_sys10199  (cost=0.00..26.30 rows=1630 width=22)
   ->  Seq Scan on part_hash_sys10200  (cost=0.00..26.30 rows=1630 width=22)
 Planning Time: 3183.644 ms
 Execution Time: 633.779 ms
(10003 rows)
Total Memory: 216852KB

PartitionedTableScan を使用した場合：

explain analyze select * from part_hash;
----------------------------------------------------------------------------------------------------------------------------------------------------
PartitionedTableScan on part_hash  (cost=0.00..1.00 rows=1 width=22) (actual time=134.348..134.352 rows=0 loops=1)(Iteration partition number 10000)
   Scan Partitions: part_hash_sys0102, part_hash_sys0103, ...part_hash_sys10198, part_hash_sys10199, part_hash_sys10200
   ->  Seq Scan on part_hash  (cost=0.00..1.00 rows=1 width=22)
 Planning Time: 293.778 ms
 Execution Time: 384.202 ms
(5 rows)
Total Memory: 40276KB

計画時間が 3,183 ms から 293 ms に短縮されています。同じテーブルの非パーティション化バージョンでは、計画時間が約 0.1 ms であるため、PartitionedTableScan にもオーバーヘッドは残りますが、そのコストは実用可能なレベルまで低下します。

結合クエリ：Append と PartitionedTableScan の比較

この問題は結合操作においてさらに顕著になります。10,000 パーティションを持つテーブルを 2 つ使用する場合：

create table part_hash2 (a int, b int, c varchar(10))
PARTITION by HASH(a) SUBPARTITION by HASH (b) PARTITIONS 100 SUBPARTITIONS 100;

Append を使用した場合：

explain analyze select count(*) from part_hash a join part_hash2 b on a.a=b.b where b.c = '0001';
                                                                            QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=48970442.90..48970442.91 rows=1 width=8) (actual time=6466.854..6859.935 rows=1 loops=1)
   ->  Gather  (cost=48970442.68..48970442.89 rows=2 width=8) (actual time=397.780..6859.902 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=48969442.68..48969442.69 rows=1 width=8) (actual time=4.748..11.768 rows=1 loops=3)
               ->  Merge Join  (cost=1403826.01..42177776.01 rows=2716666667 width=0) (actual time=4.736..11.756 rows=0 loops=3)
                     Merge Cond: (a.a = b.b)
                     ->  Sort  (cost=1093160.93..1110135.93 rows=6790000 width=4) (actual time=4.734..8.588 rows=0 loops=3)
                           Sort Key: a.a
                           Sort Method: quicksort  Memory: 25kB
                           Worker 0:  Sort Method: quicksort  Memory: 25kB
                           Worker 1:  Sort Method: quicksort  Memory: 25kB
                           ->  Parallel Append  (cost=0.00..229832.35 rows=6790000 width=4) (actual time=4.665..8.518 rows=0 loops=3)
                                 ->  Parallel Seq Scan on part_hash_sys0102 a  (cost=0.00..19.59 rows=959 width=4) (actual time=0.001..0.001 rows=0 loops=1)
                                 ->  Parallel Seq Scan on part_hash_sys0103 a_1  (cost=0.00..19.59 rows=959 width=4) (actual time=0.001..0.001 rows=0 loops=1)
                                 ->  Parallel Seq Scan on part_hash_sys0104 a_2  (cost=0.00..19.59 rows=959 width=4) (actual time=0.001..0.001 rows=0 loops=1)
                                  ...
                     ->  Sort  (cost=310665.08..310865.08 rows=80000 width=4) (never executed)
                           Sort Key: b.b
                           ->  Append  (cost=0.00..304150.00 rows=80000 width=4) (never executed)
                                 ->  Seq Scan on part_hash2_sys0102 b  (cost=0.00..30.38 rows=8 width=4) (never executed)
                                       Filter: ((c)::text = '0001'::text)
                                 ->  Seq Scan on part_hash2_sys0103 b_1  (cost=0.00..30.38 rows=8 width=4) (never executed)
                                       Filter: ((c)::text = '0001'::text)
                                 ->  Seq Scan on part_hash2_sys0104 b_2  (cost=0.00..30.38 rows=8 width=4) (never executed)
                                       Filter: ((c)::text = '0001'::text)
                                  ...

 Planning Time: 221082.616 ms
 Execution Time: 9500.148 ms
(30018 rows)
Total Memory: 679540KB

PartitionedTableScan を使用した場合：

explain analyze select count(*) from part_hash a join part_hash2 b on a.a=b.b where b.c = '0001';
                                                                            QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=2.02..2.03 rows=1 width=8) (actual time=152.322..152.326 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..2.02 rows=1 width=0) (actual time=152.308..152.311 rows=0 loops=1)
         Join Filter: (a.a = b.b)
         ->  PartitionedTableScan on part_hash a  (cost=0.00..1.00 rows=1 width=4) (actual time=152.305..152.306 rows=0 loops=1)(Iteration partition number 10000)
               Scan Partitions: part_hash_sys0102, part_hash_sys0103,, part_hash_sys10198, part_hash_sys10199, part_hash_sys10200
               ->  Seq Scan on part_hash a  (cost=0.00..1.00 rows=1 width=4)
         ->  PartitionedTableScan on part_hash2 b  (cost=0.00..1.00 rows=1 width=4) (never executed)
               ->  Seq Scan on part_hash2 b  (cost=0.00..1.00 rows=1 width=4)
                     Filter: ((c)::text = '0001'::text)
 Planning Time: 732.952 ms
 Execution Time: 436.927 ms
(11 rows)
Total Memory: 68104KB

概要（10,000 パーティションのテーブル）

メトリック	`Append`	`PartitionedTableScan`
単一クエリの計画時間	3,183.644 ms	293.778 ms
単一クエリメモリ	216,852 KB	40,276 KB
結合クエリの計画時間	221,082.616 ms	732.952 ms
結合クエリのメモリ使用量	679,540 KB	68,104 KB

PartitionedTableScan が最も効果を発揮するケース

PartitionedTableScan は、OLAP スタイルのワークロードで最大の効果を発揮します。このようなワークロードでは、すべてまたは大部分のパーティションをスキャンする必要があり、パーティションプルーニングによって多数のパーティションを除外できません。一方、パーティションプルーニングによりスキャン対象が少数のパーティションに絞られる OLTP のポイントクエリでは、改善効果は小さくなります。

制限事項

PolarDB for PostgreSQL (Compatible with Oracle) 1.0 および 2.0 クラスターの両方が PartitionedTableScan をサポートしていますが、クラスターのリビジョンバージョンが V1.1.32 以降である必要があります。
説明
リビジョンバージョンがそれ以前のクラスターについては、テクニカルサポートまでお問い合わせいただき、PartitionedTableScan を有効にしてください。
SELECT 文のみがサポートされます。DML ステートメントはサポートされません。
PartitionedTableScan はパーティションワイズ結合と互換性がありません。パーティションワイズ結合が有効になっている場合、オプティマイザーは PartitionedTableScan を使用しません。

PartitionedTableScan の有効化

以下の例では、次のテーブルを使用します。

CREATE TABLE prt1 (a int, b int, c varchar) PARTITION BY Hash(a) partitions 16;

しきい値パラメーターの設定

PartitionedTableScan は、パーティション数に基づいて自動的に有効になります。polar_num_parts_for_partitionedscan パラメーターを使用してこの動作を制御できます。

パラメーター	有効な値	デフォルト	説明
`polar_num_parts_for_partitionedscan`	`-1` ～ `INT_MAX`	`64`	`PartitionedTableScan` を使用するパーティション数のしきい値です。`-1` を設定すると常にこのオペレーターを使用し、`0` を設定すると無効になります。

パーティション数に関係なく PartitionedTableScan を強制的に使用するには、次のように設定します。

SET polar_num_parts_for_partitionedscan to -1;
explain select * from prt1;
                           QUERY PLAN
-----------------------------------------------------------------
 PartitionedTableScan on prt1  (cost=0.00..1.00 rows=1 width=40)
   ->  Seq Scan on prt1  (cost=0.00..1.00 rows=1 width=40)
(2 rows)

クエリヒントの使用

PARTEDSCAN(table alias) ヒントワードを使用すると、パラメーターを変更せずに特定のクエリに対して PartitionedTableScan を有効にできます。

EXPLAIN select /*+PARTEDSCAN(prt1) */ select * from prt1;
                           QUERY PLAN
-----------------------------------------------------------------
 PartitionedTableScan on prt1  (cost=0.00..1.00 rows=1 width=40)
   ->  Seq Scan on prt1  (cost=0.00..1.00 rows=1 width=40)
(2 rows)

並列実行

PartitionedTableScan は、PartitionedTableScan Append を通じて並列実行をサポートします。次の 2 つのモードが利用可能です。

パーティション間モード

各ワーカーが 1 つのパーティションを処理します。プランには Parallel PartitionedTableScan オペレーターが表示されます。

EXPLAIN (COSTS OFF) select /*+PARTEDSCAN(prt1) */ * from prt1;
                 QUERY PLAN
---------------------------------------------
 Gather
   Workers Planned: 4
   ->  Parallel PartitionedTableScan on prt1
         ->  Seq Scan on prt1
(4 rows)

ハイブリッドモード

ワーカーは、パーティション内およびパーティション間の両方で並列に実行されます。プランには同様に Parallel PartitionedTableScan が表示されますが、リーフレベルでは Parallel Seq Scan が使用されます。

EXPLAIN (COSTS OFF) select /*+PARTEDSCAN(prt1) */ * from prt1;
                 QUERY PLAN
---------------------------------------------
 Gather
   Workers Planned: 8
   ->  Parallel PartitionedTableScan on prt1
         ->  Parallel Seq Scan on prt1
(4 rows)

オプティマイザーは、各モードのコストモデルに基づいて、推定コストが低いモードを選択します。

パーティションプルーニング

PartitionedTableScan は、3 つの段階すべてでパーティションプルーニングをサポートします。詳細については、「パーティションプルーニング」をご参照ください。

パフォーマンステスト

説明

これらの結果は開発環境での測定値であり、参考情報として提供されています。実際の結果は構成やワークロードによって異なります。パーティション数のみを変数として使用し、その他の環境設定はすべて一定に保っています。

テストクエリ：

explain select  * from prt1 where b = 10;

explain select /*+PARTEDSCAN(prt1) */ * from prt1 where b = 10;

テスト環境： pgbench -i --scale=10、pgbench -c 64 -j 64 -n -T60

計画時間

パーティション	`Append` の計画時間	`PartitionedTableScan` の計画時間
16	0.266 ms	0.067 ms
32	1.820 ms	0.258 ms
64	3.654 ms	0.402 ms
128	7.010 ms	0.664 ms
256	14.095 ms	1.247 ms
512	27.697 ms	2.328 ms
1024	73.176 ms	4.165 ms

メモリ使用量

パーティション	`追記` メモリ	`PartitionedTableScan` のメモリ使用量
16	1,170 KB	1,044 KB
32	1,240 KB	1,044 KB
64	2,120 KB	1,624 KB
128	2,244 KB	1,524 KB
256	2,888 KB	2,072 KB
512	4,720 KB	3,012 KB
1024	8,236 KB	5,280 KB

1 秒あたりのクエリ数（QPS）

パーティション	`Append` QPS	`PartitionedTableScan` QPS
16	25,318	93,950
32	10,906	61,879
64	5,281	30,839
128	2,195	16,684
256	920	8,372
512	92	3,708
1024	21	1,190

改善効果はパーティション数の増加に伴って大きくなります。1,024 パーティションの場合、PartitionedTableScan は Append と比較して約 56 倍の QPS を達成しています。

PolarDB:プランの高速化

仕組み