This topic describes the PartitionedTable Scan feature, including its background, usage, and performance.
Background
PolarDB for PostgreSQL does not limit the number of partitions in a partitioned table. If a table has more than two levels of partitions, the number of partitions increases exponentially.
For example, a table has two levels of partitions. The first level has 100 hash partitions. Each first-level partition is then divided into 100 hash subpartitions. The entire partitioned table has 10,000 partitions. If you query this partitioned table, the query plan is as follows:
explain analyze select * from part_hash;
QUERY PLAN
-----------------------------------------------------------------------------
Append (cost=0.00..344500.00 rows=16300000 width=22)
-> Seq Scan on part_hash_sys0102 (cost=0.00..26.30 rows=1630 width=22)
-> Seq Scan on part_hash_sys0103 (cost=0.00..26.30 rows=1630 width=22)
-> Seq Scan on part_hash_sys0104 (cost=0.00..26.30 rows=1630 width=22)
...
...
...
-> Seq Scan on part_hash_sys10198 (cost=0.00..26.30 rows=1630 width=22)
-> Seq Scan on part_hash_sys10199 (cost=0.00..26.30 rows=1630 width=22)
-> Seq Scan on part_hash_sys10200 (cost=0.00..26.30 rows=1630 width=22)
Planning Time: 3183.644 ms
Execution Time: 633.779 ms
(10003 rows)The results show that the query is slow. This is because the optimizer handles partitioned tables by first generating an optimal plan for each partition and then using the Append operator to combine these plans into a final plan for the partitioned table. If a partitioned table has a small number of partitions, this process is fast and users will not notice a delay. However, as the number of partitions grows, the process becomes noticeably slower. Queries on partitioned tables can become much slower than queries on standard tables.
For example, in the preceding SQL statement, the table part_hash has 10,000 partitions. Its Planning Time is about 3 seconds. In contrast, the Planning Time for a query on a standard table is only 0.1 milliseconds. This is a difference of several hundred times. In addition to a long Planning Time, the query process also uses a large amount of memory, which can cause an out-of-memory (OOM) error.
This drawback of partitioned tables is more obvious in join queries:
CREATE TABLE part_hash2 (a int, b int, c varchar(10))
PARTITION BY HASH(a) SUBPARTITION BY HASH (b) PARTITIONS 100 SUBPARTITIONS 100;
explain analyze select count(*) from part_hash a join part_hash2 b on a.a=b.b where b.c = '0001';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=48970442.90..48970442.91 rows=1 width=8) (actual time=6466.854..6859.935 rows=1 loops=1)
-> Gather (cost=48970442.68..48970442.89 rows=2 width=8) (actual time=397.780..6859.902 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=48969442.68..48969442.69 rows=1 width=8) (actual time=4.748..11.768 rows=1 loops=3)
-> Merge Join (cost=1403826.01..42177776.01 rows=2716666667 width=0) (actual time=4.736..11.756 rows=0 loops=3)
Merge Cond: (a.a = b.b)
-> Sort (cost=1093160.93..1110135.93 rows=6790000 width=4) (actual time=4.734..8.588 rows=0 loops=3)
Sort Key: a.a
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: quicksort Memory: 25kB
Worker 1: Sort Method: quicksort Memory: 25kB
-> Parallel Append (cost=0.00..229832.35 rows=6790000 width=4) (actual time=4.665..8.518 rows=0 loops=3)
-> Parallel Seq Scan on part_hash_sys0102 a (cost=0.00..19.59 rows=959 width=4) (actual time=0.001..0.001 rows=0 loops=1)
-> Parallel Seq Scan on part_hash_sys0103 a_1 (cost=0.00..19.59 rows=959 width=4) (actual time=0.001..0.001 rows=0 loops=1)
-> Parallel Seq Scan on part_hash_sys0104 a_2 (cost=0.00..19.59 rows=959 width=4) (actual time=0.001..0.001 rows=0 loops=1)
...
-> Sort (cost=310665.08..310865.08 rows=80000 width=4) (never executed)
Sort Key: b.b
-> Append (cost=0.00..304150.00 rows=80000 width=4) (never executed)
-> Seq Scan on part_hash2_sys0102 b (cost=0.00..30.38 rows=8 width=4) (never executed)
Filter: ((c)::text = '0001'::text)
-> Seq Scan on part_hash2_sys0103 b_1 (cost=0.00..30.38 rows=8 width=4) (never executed)
Filter: ((c)::text = '0001'::text)
-> Seq Scan on part_hash2_sys0104 b_2 (cost=0.00..30.38 rows=8 width=4) (never executed)
Filter: ((c)::text = '0001'::text)
...
Planning Time: 221082.616 ms
Execution Time: 9500.148 ms
(30018 rows)This shows that during a full table scan, a partitioned table is less efficient than a standard table. This happens because the query has no conditions to limit the scan to specific partitions. All advantages of a partitioned table are lost. You can use partition pruning to limit a query to a small number of partitions. However, some online analytical processing (OLAP) scenarios require a full table scan of the entire partitioned table.
Overview
To solve this problem, PolarDB for PostgreSQL provides the PartitionedTableScan operator. This query operator for partitioned tables is more efficient than the Append operator. It significantly reduces Planning Time and consumes less memory, which helps prevent Out of Memory (OOM) errors.
The following shows the Planning Time and memory usage for two separate SQL queries run with the PartitionedTableScan operator.
explain analyze select * from part_hash;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
PartitionedTableScan on part_hash (cost=0.00..1.00 rows=1 width=22) (actual time=134.348..134.352 rows=0 loops=1)(Iteration partition number 10000)
Scan Partitions: part_hash_sys0102, part_hash_sys0103, ...part_hash_sys10198, part_hash_sys10199, part_hash_sys10200
-> Seq Scan on part_hash (cost=0.00..1.00 rows=1 width=22)
Planning Time: 293.778 ms
Execution Time: 384.202 ms
(5 rows)
explain analyze select count(*) from part_hash a join part_hash2 b on a.a=b.b where b.c = '0001';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2.02..2.03 rows=1 width=8) (actual time=152.322..152.326 rows=1 loops=1)
-> Nested Loop (cost=0.00..2.02 rows=1 width=0) (actual time=152.308..152.311 rows=0 loops=1)
Join Filter: (a.a = b.b)
-> PartitionedTableScan on part_hash a (cost=0.00..1.00 rows=1 width=4) (actual time=152.305..152.306 rows=0 loops=1)(Iteration partition number 10000)
Scan Partitions: part_hash_sys0102, part_hash_sys0103,, part_hash_sys10198, part_hash_sys10199, part_hash_sys10200
-> Seq Scan on part_hash a (cost=0.00..1.00 rows=1 width=4)
-> PartitionedTableScan on part_hash2 b (cost=0.00..1.00 rows=1 width=4) (never executed)
-> Seq Scan on part_hash2 b (cost=0.00..1.00 rows=1 width=4)
Filter: ((c)::text = '0001'::text)
Planning Time: 732.952 ms
Execution Time: 436.927 ms
(11 rows)As you can see, in this example, the Planning Time is significantly reduced after using the PartitionedTableScan operator. The following table shows a comparison of the specific values:
Type |
|
|
Single Query Planning Time | 3183.644ms | 293.778ms |
Join Query Planning Time | 221082.616ms | 732.952ms |
Applicability
The
PartitionedTableScanoperator is supported only on PolarDB for PostgreSQL 14 with a minor engine version of 2.0.11.9.32.0 or later.PartitionedTableScancurrently supports onlySELECTstatements. It does not support DML statements.PartitionedTableScandoes not support partition-wise join. If you enable the partition-wise join feature, aPartitionedTableScanplan is not generated.
The PartitionedTable Scan feature is available only for clusters with a minor engine version of 2.0.11.9.32.0 or later. If you have an existing cluster with an earlier minor engine version and need to use this feature, contact us to enable it.
Usage
The following examples show how to use the PartitionedTableScan feature.
First, create a partitioned table.
CREATE TABLE prt1 (a int, b int, c varchar) PARTITION BY Hash(a) partitions 16;Enable PartitionedTableScan using a parameter
You can enable or disable the PartitionedTableScan feature by setting the polar_num_parts_for_pts parameter.
Parameter | Valid values | Default value | Description |
polar_num_parts_for_pts | -1 to INT_MAX | 0 | Specifies the threshold for the number of subpartitions in a partitioned table that triggers the
|
The following is an example:
SET polar_num_parts_for_pts to -1;
explain select * from prt1;
QUERY PLAN
-----------------------------------------------------------------
PartitionedTableScan on prt1 (cost=0.00..1.00 rows=1 width=40)
-> Seq Scan on prt1 (cost=0.00..1.00 rows=1 width=40)
(2 rows)Use a HINT
Use the HINT syntax PARTEDSCAN(table alias). The following is an example:
EXPLAIN select /*+PARTEDSCAN(prt1) */ select * from prt1;
QUERY PLAN
-----------------------------------------------------------------
PartitionedTableScan on prt1 (cost=0.00..1.00 rows=1 width=40)
-> Seq Scan on prt1 (cost=0.00..1.00 rows=1 width=40)
(2 rows)Parallel query
PolarDB for PostgreSQL supports parallel query for partitioned tables. This feature enables efficient queries on large datasets. Similar to Append, PartitionedTableScan also supports parallel query. However, the parallel query implementation for PartitionedTableScan is called Parallel PartitionedTableScan and supports only two modes of parallelism: inter-partition parallelism and hybrid parallelism.

Inter-partition parallelism
Inter-partition parallelism means that each worker queries a different partition. This allows multiple workers to query the entire partitioned table in parallel.
EXPLAIN (COSTS OFF) select /*+PARTEDSCAN(prt1) */ * from prt1;
QUERY PLAN
---------------------------------------------
Gather
Workers Planned: 4
-> Parallel PartitionedTableScan on prt1
-> Seq Scan on prt1
(4 rows)As shown in the preceding plan, four parallel workers are started for the partitioned table (Workers Planned: 4). Each worker is responsible for querying one partition. A clear indicator is the operator named Parallel PartitionedTableScan.
Hybrid parallelism
Hybrid parallelism means that execution can run in parallel both between partitions and within each partition. This provides the highest degree of parallelism for a parallel query.
EXPLAIN (COSTS OFF) select /*+PARTEDSCAN(prt1) */ * from prt1;
QUERY PLAN
---------------------------------------------
Gather
Workers Planned: 8
-> Parallel PartitionedTableScan on prt1
-> Parallel Seq Scan on prt1
(4 rows)As shown in the preceding plan, the query uses eight workers for parallel execution (Workers Planned: 8). Queries can run in parallel between partitions and also within each partition. A clear indicator is the operator named Parallel PartitionedTableScan.
Both parallel modes have their own cost models. The optimizer chooses the optimal one.
Partition pruning
Like Append, PartitionedTableScan supports partition pruning in three stages. For more information, see Partition pruning.
Performance comparison
PartitionedTableScan is more efficient than Append. The following test data shows a performance comparison between PartitionedTableScan and Append.
The following data is from a development environment and is for reference only. Actual performance may vary based on your configuration and conditions. The purpose of this test is to compare the performance of Append and PartitionedTableScan in a consistent environment where the operator is the only variable.
The following SQL statements are used for testing:
explain select * from prt1 where b = 10;
explain select /*+PARTEDSCAN(prt1) */ * from prt1 where b = 10; Planning time of a single SQL statement
Number of partitions | Append Planning Time | PartitionedTableScan Planning Time |
16 | 0.266ms | 0.067ms |
32 | 1.820ms | 0.258ms |
64 | 3.654ms | 0.402ms |
128 | 7.010ms | 0.664ms |
256 | 14.095ms | 1.247ms |
512 | 27.697ms | 2.328ms |
1024 | 73.176ms | 4.165ms |
Memory (Memory usage of a single SQL statement)
Number of partitions | Append Memory | PartitionedTableScan Memory |
16 | 1,170 KB | 1,044 KB |
32 | 1,240 KB | 1,044 KB |
64 | 2,120 KB | 1,624 KB |
128 | 2,244 KB | 1,524 KB |
256 | 2,888 KB | 2,072 KB |
512 | 4,720 KB | 3,012 KB |
1024 | 8,236 KB | 5,280 KB |
Queries per second (QPS)
pgbench -i --scale=10
pgbench -c 64 -j 64 -n -T60
Query:
explain select * from prt1 where b = 10;
explain select /*+PARTEDSCAN(prt1) */ * from prt1 where b = 10; Number of partitions | Append QPS | PartitionedTableScan QPS |
16 | 25,318 | 93,950 |
32 | 10,906 | 61,879 |
64 | 5,281 | 30,839 |
128 | 2,195 | 16,684 |
256 | 920 | 8,372 |
512 | 92 | 3,708 |
1024 | 21 | 1,190 |
Conclusion
The preceding comparison between PartitionedTableScan and Append shows that PartitionedTableScan significantly improves performance over Append as the number of partitions increases. Therefore, if your partitioned tables have many partitions and a long Planning Time, we recommend using PartitionedTableScan for optimization.