This topic describes the PartitionedTableScan feature, providing information on its background, usage methods, and statistics on performance improvement.
Background
In , no limits are imposed on the number of partitions in a partitioned table. If a table has two or more partition levels, the number of partitions increases exponentially.
For example, a table is hash-partitioned into 100 partitions, and then each partition is hash-partitioned into 100 subpartitions. There would be 10,000 partitions in total for this partitioned table. Use the EXPLAIN command to obtain the query execution plan:
explain analyze select * from part_hash;
QUERY PLAN
-----------------------------------------------------------------------------
Append (cost=0.00..344500.00 rows=16300000 width=22)
-> Seq Scan on part_hash_sys0102 (cost=0.00..26.30 rows=1630 width=22)
-> Seq Scan on part_hash_sys0103 (cost=0.00..26.30 rows=1630 width=22)
-> Seq Scan on part_hash_sys0104 (cost=0.00..26.30 rows=1630 width=22)
...
...
...
-> Seq Scan on part_hash_sys10198 (cost=0.00..26.30 rows=1630 width=22)
-> Seq Scan on part_hash_sys10199 (cost=0.00..26.30 rows=1630 width=22)
-> Seq Scan on part_hash_sys10200 (cost=0.00..26.30 rows=1630 width=22)
Planning Time: 3183.644 ms
Execution Time: 633.779 ms
(10003 rows)
Total Memory: 216852KBAs is shown in the results, the query planning takes a relatively long time, which is related to how the optimizer generates an execution plan for a partitioned table: it picks an optimal plan for each partition and appends these plans into a single plan for the entire partitioned table. If the partitioned table only has a small number of partitions, the process would be fast and deliver satisfactory user experience. However, as the number of partitions grows, the process becomes slower, and you may find partitioned tables perform worse than non-partitioned tables.
In the previous query, the table part_hash has 10,000 partitions. Its planning time is 3 seconds. In contrast, if the table were not partitioned, the planning time for this query would be around 0.1 millisecond only, which makes for a hundred-time difference. In addition to long planning time, the query for the partitioned table also causes high memory usage, which could lead to an OOM issue.
The performance is even worse when joins are included in a query, as is shown in the following query plan:
CREATE TABLE part_hash2 (a int, b int, c varchar(10))
PARTITION BY HASH(a) SUBPARTITION BY HASH (b) PARTITIONS 100 SUBPARTITIONS 100;
explain analyze select count(*) from part_hash a join part_hash2 b on a.a=b.b where b.c = '0001';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=48970442.90..48970442.91 rows=1 width=8) (actual time=6466.854..6859.935 rows=1 loops=1)
-> Gather (cost=48970442.68..48970442.89 rows=2 width=8) (actual time=397.780..6859.902 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=48969442.68..48969442.69 rows=1 width=8) (actual time=4.748..11.768 rows=1 loops=3)
-> Merge Join (cost=1403826.01..42177776.01 rows=2716666667 width=0) (actual time=4.736..11.756 rows=0 loops=3)
Merge Cond: (a.a = b.b)
-> Sort (cost=1093160.93..1110135.93 rows=6790000 width=4) (actual time=4.734..8.588 rows=0 loops=3)
Sort Key: a.a
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: quicksort Memory: 25kB
Worker 1: Sort Method: quicksort Memory: 25kB
-> Parallel Append (cost=0.00..229832.35 rows=6790000 width=4) (actual time=4.665..8.518 rows=0 loops=3)
-> Parallel Seq Scan on part_hash_sys0102 a (cost=0.00..19.59 rows=959 width=4) (actual time=0.001..0.001 rows=0 loops=1)
-> Parallel Seq Scan on part_hash_sys0103 a_1 (cost=0.00..19.59 rows=959 width=4) (actual time=0.001..0.001 rows=0 loops=1)
-> Parallel Seq Scan on part_hash_sys0104 a_2 (cost=0.00..19.59 rows=959 width=4) (actual time=0.001..0.001 rows=0 loops=1)
...
-> Sort (cost=310665.08..310865.08 rows=80000 width=4) (never executed)
Sort Key: b.b
-> Append (cost=0.00..304150.00 rows=80000 width=4) (never executed)
-> Seq Scan on part_hash2_sys0102 b (cost=0.00..30.38 rows=8 width=4) (never executed)
Filter: ((c)::text = '0001'::text)
-> Seq Scan on part_hash2_sys0103 b_1 (cost=0.00..30.38 rows=8 width=4) (never executed)
Filter: ((c)::text = '0001'::text)
-> Seq Scan on part_hash2_sys0104 b_2 (cost=0.00..30.38 rows=8 width=4) (never executed)
Filter: ((c)::text = '0001'::text)
...
Planning Time: 221082.616 ms
Execution Time: 9500.148 ms
(30018 rows)
Total Memory: 679540KBThe results show that the performance of a full table query of a partitioned table is lower than that of a non-partitioned table. This is because a full table query does not include any condition that can limit the query to a specific partition. Although partition pruning can be used to limit queries to some partitions, some online analytical processing (OLAP) queries still require a full table scan.
Overview
To resolve this performance issue, provides the PartitionedTableScan operator for querying partitioned tables. It is more efficient than the Append operator, significantly reducing planning time and avoiding OOM because less memory is consumed.
If PartitionedTableScan is used, the two SQL statements would be executed with much less planning time and memory consumption.
explain analyze select * from part_hash;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
PartitionedTableScan on part_hash (cost=0.00..1.00 rows=1 width=22) (actual time=134.348..134.352 rows=0 loops=1)(Iteration partition number 10000)
Scan Partitions: part_hash_sys0102, part_hash_sys0103, ...part_hash_sys10198, part_hash_sys10199, part_hash_sys10200
-> Seq Scan on part_hash (cost=0.00..1.00 rows=1 width=22)
Planning Time: 293.778 ms
Execution Time: 384.202 ms
(5 rows)
Total Memory: 40276KB
explain analyze select count(*) from part_hash a join part_hash2 b on a.a=b.b where b.c = '0001';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2.02..2.03 rows=1 width=8) (actual time=152.322..152.326 rows=1 loops=1)
-> Nested Loop (cost=0.00..2.02 rows=1 width=0) (actual time=152.308..152.311 rows=0 loops=1)
Join Filter: (a.a = b.b)
-> PartitionedTableScan on part_hash a (cost=0.00..1.00 rows=1 width=4) (actual time=152.305..152.306 rows=0 loops=1)(Iteration partition number 10000)
Scan Partitions: part_hash_sys0102, part_hash_sys0103,, part_hash_sys10198, part_hash_sys10199, part_hash_sys10200
-> Seq Scan on part_hash a (cost=0.00..1.00 rows=1 width=4)
-> PartitionedTableScan on part_hash2 b (cost=0.00..1.00 rows=1 width=4) (never executed)
-> Seq Scan on part_hash2 b (cost=0.00..1.00 rows=1 width=4)
Filter: ((c)::text = '0001'::text)
Planning Time: 732.952 ms
Execution Time: 436.927 ms
(11 rows)
Total Memory: 68104KBBoth planning time and total memory are significantly reduced. See the following table for details:
Item |
|
|
Single Query Planning Time | 3183.644ms | 293.778ms |
Single Total Memory | 216852 KB | 40276 KB |
Join Query Planning Time | 221082.616ms | 732.952ms |
Join Total Memory | 679540 KB | 68104 KB |
Supported versions and limitations
Only
SELECTcommands can be executed usingPartitionedTableScan. DML statements are not supported.PartitionedTableScancannot be used together with partition-wise join. If the partition-wise join feature is enabled, execution plans will not include thePartitionedTableScanoperator.
If you want to enable PartitionedTableScan for a cluster whose revision version is earlier than 1.1.32, contact technical support.
Usage
This section describes how to use PartitionedTableScan with examples.
Start with creating a partitioned table.
CREATE TABLE prt1 (a int, b int, c varchar) PARTITION BY Hash(a) partitions 16;Enable PartitionedTableScan
You can enable or disable the PartitionedTableScan feature by configuring the polar_num_parts_for_pts parameter.
Parameter | Valid values | Default value | Description |
polar_num_parts_for_pts | -1 to INT_MAX | 0 | The number of partitions beyond which the
|
Example:
SET polar_num_parts_for_pts to -1;
explain select * from prt1;
QUERY PLAN
-----------------------------------------------------------------
PartitionedTableScan on prt1 (cost=0.00..1.00 rows=1 width=40)
-> Seq Scan on prt1 (cost=0.00..1.00 rows=1 width=40)
(2 rows)Use query hints
You can use the hint syntax PARTEDSCAN (table alias) to dictate the use of PartitionedTableScan.
EXPLAIN select /*+PARTEDSCAN(prt1) */ select * from prt1;
QUERY PLAN
-----------------------------------------------------------------
PartitionedTableScan on prt1 (cost=0.00..1.00 rows=1 width=40)
-> Seq Scan on prt1 (cost=0.00..1.00 rows=1 width=40)
(2 rows)Parallel query for PartitionedTableScan
supports parallel query against a partitioned table for the Append operator (see Parallel append), reducing the time taken to query large amounts of data. Parallel query is also supported for the PartitionedTableScan operator. However, unlike the Append operator, only two modes of parallelism are supported for thePartitionedTableScan operator (or the Parallel PartitionedTableScan operator in the context of parallel querying): inter-partition parallelism and hybrid parallelism.

Inter-partition parallelism
In this mode, each worker handles a partition, and they work in parallel.
EXPLAIN (COSTS OFF) select /*+PARTEDSCAN(prt1) */ * from prt1;
QUERY PLAN
---------------------------------------------
Gather
Workers Planned: 4
-> Parallel PartitionedTableScan on prt1
-> Seq Scan on prt1
(4 rows)In the preceding query plan, four workers are used in the execution, each of which works on a single partition. The Parallel PartitionedTableScan operator is the indicator of parallelism.
Hybrid parallelism
In hybrid mode, the system handles the execution in parallel both within and across partitions.
EXPLAIN (COSTS OFF) select /*+PARTEDSCAN(prt1) */ * from prt1;
QUERY PLAN
---------------------------------------------
Gather
Workers Planned: 8
-> Parallel PartitionedTableScan on prt1
-> Parallel Seq Scan on prt1
(4 rows)In the preceding query plan, eight workers are used to execute the query in parallel within single partitions and on different partitions. The Parallel PartitionedTableScan operator is the indicator of parallelism.
Each of the modes has its cost model, and the optimizer selects the mode that is less costly.
Partition pruning
Similar to Append, PartitionedTableScan supports partition pruning in all three stages. For more information, see Partition pruning.
Performance comparison
PartitionedTableScan delivers higher efficiency. This performance test shows the performance improvement made by PartitionedTableScan.
The following results come from the test environment and are provided only for reference. The actual results may vary based on different configurations and conditions. The tests are conducted in environments with consistent configurations to compare the performance between Append and PartitionedTableScan. The number of partitions is the only variable.
The following SQL statements are used for testing:
explain select * from prt1 where b = 10;
explain select /*+PARTEDSCAN(prt1) */ * from prt1 where b = 10; Planning Time of a single SQL statement
Number of partitions |
| |
16 | 0.266ms | 0.067ms |
32 | 1.820ms | 0.258ms |
64 | 3.654ms | 0.402ms |
128 | 7.010ms | 0.664ms |
256 | 14.095ms | 1.247ms |
512 | 27.697ms | 2.328ms |
1024 | 73.176ms | 4.165ms |
Memory usage of a single SQL statement
Number of partitions |
|
|
16 | 1,170 KB | 1,044 KB |
32 | 1,240 KB | 1,044 KB |
64 | 2,120 KB | 1,624 KB |
128 | 2,244 KB | 1,524 KB |
256 | 2,888 KB | 2,072 KB |
512 | 4,720 KB | 3,012 KB |
1024 | 8,236 KB | 5,280 KB |
Query per Second (QPS)
pgbench -i --scale=10
pgbench -c 64 -j 64 -n -T60
Query:
explain select * from prt1 where b = 10;
explain select /*+PARTEDSCAN(prt1) */ * from prt1 where b = 10; Number of partitions |
|
|
16 | 25,318 | 93,950 |
32 | 10,906 | 61,879 |
64 | 5,281 | 30,839 |
128 | 2,195 | 16,684 |
256 | 920 | 8,372 |
512 | 92 | 3,708 |
1024 | 21 | 1,190 |
Conclusion
The preceding performance test results show that PartitionedTableScan can significantly improve performance, especially when a large number of partitions are involved. Therefore, we recommend using PartitionedTableScan to improve the performance of querying a table with an excessive number of partitions.