PolarDB for PostgreSQL (Compatible with Oracle) supports the partition pruning feature, which can significantly speed up queries on partitioned tables.
Overview
When partition pruning is enabled, the query planner examines the definition of each partition and try to prove that the partition does not need to be scanned because it does not contain any rows meeting the WHERE
clause. When the planner can prove this, the query planner excludes or prunes the partition from the query plan. Partition pruning significantly reduces the amount of data retrieved from disk and shortens processing time, thus improving query performance and resource utilization.
PolarDB for PostgreSQL (Compatible with Oracle) may use static or dynamic pruning based on the actual SQL statement.
Static pruning is performed during compilation, with the information of the partitions accessed beforehand. A typical scenario for static pruning is when an SQL statement contains a WHERE condition with a constant literal on the partition key column.
Dynamic pruning is performed at run-time, meaning that the exact partitions to be accessed by a statement are not known beforehand. An example scenario for dynamic pruning is when the WHERE condition uses operators or functions.
Partition pruning affects the statistics of the object on which the pruning is performed and the execution plan of the statement that is executed on the object.
The system limits the data scan only to the partitions that may contain the desired records. Both static pruning and dynamic pruning improve the query performance by excluding partitions from an execution plan.
Partition pruning cannot optimize queries on subpartitioned tables or queries on tables that are partitioned by using the RANGE method based on multiple columns.
Similar to constraint exclusion, partition pruning optimizes a query that contains a WHERE clause only when the qualifiers in the WHERE clause use a specified format. For more information, see Constraint exclusion.
Parameters
Parameter | Description |
polar_comp_enable_pruning | Specifies whether to enable partition pruning. Valid values:
|
Usage notes
By partition type
Queries for list-partitioned tables
For list-partitioned tables, partition pruning can quickly prune the queries that contain WHERE clauses. An operator in a WHERE clause can be equal to (=), IS NULL, or IS NOT NULL.
Example:
------ Specify a list-partitioned table. CREATE TABLE sales_hist(..., country text, ...) PARTITION BY LIST(country) ( PARTITION americas VALUES('US', 'CA', 'MX'), PARTITION europe VALUES('BE', 'NL', 'FR'), PARTITION asia VALUES('JP', 'PK', 'CN'), PARTITION others VALUES(DEFAULT) ) ------ Extract the information that contains the following WHERE clauses from the list-partitioned table. WHERE country = 'US' WHERE country IS NULL;
If the first WHERE clause is used, partition pruning excludes the europe, asia, and others partitions because the partitions do not meet
WHERE country = 'US'
.If the second WHERE clause is used, partition pruning excludes the americas, europe, and asia partitions because these partitions do not meet
WHERE country IS NULL
.
Queries for range-partitioned tables
For range-partitioned tables, partition pruning can quickly prune the queries that contain WHERE clauses. An operator in a WHERE clause can be equal to (=), greater than (>), greater than or equal to (>=), less than (<), less than or equal to (<=), IS NULL, IS NOT NULL, or an expression consisting of BETWEEN and AND. Examples:
WHERE size > 100 AND size <= 200 WHERE size BETWEEN 100 AND 200
NotePartition pruning does not support expressions that contain the OR or IN operator.
Example:
------ Specify a range-partitioned table. CREATE TABLE boxes(id int, size int, color text) PARTITION BY RANGE(size) ( PARTITION small VALUES LESS THAN(100), PARTITION medium VALUES LESS THAN(200), PARTITION large VALUES LESS THAN(300) ) ------ Extract the information that contains the following WHERE clauses from the range-partitioned table. WHERE size > 100-- Scan the medium and large partitions. WHERE size >= 100-- Scan the medium and large partitions. WHERE size =100-- Scan the medium partition. WHERE size <= 100-- Scan the small and medium partitions. WHERE size < 100-- Scan the small partition. WHERE size > 100 AND size < 199-- Scan the medium partition. WHERE size BETWEEN 100 AND 199-- Scan the medium partition. WHERE color = 'red' AND size = 100 -- Scan the medium partition. WHERE color = 'red' AND (size > 100 AND size < 199) -- Scan the medium partition.
By stage
In PolarDB for PostgreSQL (Compatible with Oracle), conditional expressions are categorized into those that are immutable, stable, and volatile. The three types of conditional expressions correspond to three types of pruning.
If a conditional expression is immutable, such as a constant literal, partition pruning is performed in the optimizer stage.
If a conditional expression is stable, such as an expression that contains
now()
, partition pruning is performed in the early executor stage.If a conditional expression is volatile, such as an expression that contains
random()
, partition pruning is performed in the later executor stage.
Partition pruning in the optimizer stage
The following example shows how partition pruning is performed in the optimizer stage.
In this example, a table named measurement
that contains the following four partitions is created by using the logdate
field as the partition key: measurement_y2023q1
,measurement_y2023q2
, measurement_y2023q3
, measurement_y2023q4
. Each partition corresponds to data in a quarter of 2023.
CREATE TABLE measurement(
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);
CREATE TABLE measurement_y2023q1 PARTITION OF measurement
FOR VALUES FROM ('2023-01-01') TO ('2023-04-01');
CREATE TABLE measurement_y2023q2 PARTITION OF measurement
FOR VALUES FROM ('2023-04-01') TO ('2023-07-01');
CREATE TABLE measurement_y2023q3 PARTITION OF measurement
FOR VALUES FROM ('2023-07-01') TO ('2023-10-01');
CREATE TABLE measurement_y2023q4 PARTITION OF measurement
FOR VALUES FROM ('2023-10-01') TO ('2024-04-01');
Query the measurement table based on the following condition: logdate
>= DATE '2023-10-01
'. As shown in the results of the EXPLAIN
command, the partitions for the first, second, and third quarters are pruned by default. The system does not query data in the three partitions because their data does not meet the condition logdate >= DATE '2023-10-01'
. In this example, partition pruning is performed in the optimizer stage. The filter condition is specified based on the partition key logdate
, and DATE '2023-10-01'
used in the conditional expression is constant. The conditional expression can be calculated when the optimizer is initialized.
EXPLAIN SELECT * FROM measurement WHERE logdate >= DATE '2023-10-01';
QUERY PLAN
-----------------------------------------------------------------------------
Append (cost=0.00..34.09 rows=567 width=20)
-> Seq Scan on measurement_y2023q4 (cost=0.00..31.25 rows=567 width=20)
Filter: (logdate >= '01-OCT-23 00:00:00'::date)
(3 rows)
Partition pruning in the early executor stage
The following example uses the measurement
table created in the first example to describe how partition pruning is performed in the early executor stage.
The following example uses the same table measurement
and performs the same query as the first example. However, the conditional expression used in the following example changes from a constant expression to a stable expression now()
. The new conditional expression cannot be calculated in the optimizer stage, but can be calculated in the early executor stage. If the month is July 2023, the two partitions for the first and second quarters are removed (Subplans Removed: 2
). Only the partitions for the third and fourth quarters need to be scanned. In this example, partition pruning is performed in the early executor stage. The filter condition is specified based on the partition key logdate
, and the conditional expression now()
is stable. The conditional expression can be calculated in this stage.
EXPLAIN SELECT * FROM measurement WHERE logdate >= now();
QUERY PLAN
-----------------------------------------------------------------------------
Append (cost=0.00..153.34 rows=2268 width=20)
Subplans Removed: 2
-> Seq Scan on measurement_y2023q3 (cost=0.00..35.50 rows=567 width=20)
Filter: (logdate >= now())
-> Seq Scan on measurement_y2023q4 (cost=0.00..35.50 rows=567 width=20)
Filter: (logdate >= now())
(6 rows)
Partition pruning in the later executor stage
The following example uses the measurement
table created in the first example to describe how partition pruning is performed in the later executor stage.
The following example uses the same table and performs the same query as the first example. However, the conditional expression used in the following example changes from a constant expression to a volatile subjoin (select to_date('2023-10-1'))
. The new conditional expression cannot be calculated in the early executor stage, but can only be calculated in the later executor stage.
As shown in the results of the EXPLAIN ANALYZE
command, the partitions for the first three quarters are tagged (never executed)
, which means the partitions are pruned. In this example, partition pruning is performed in the later executor stage. This type of partition pruning is applicable to volatile expressions, subqueries, subjoins, and expressions of join conditions. The filter condition is specified based on the partition key logdate
, and the conditional expression (select to_date('2023-10-1'))
is a volatile subjoin. The conditional expression can be calculated in this stage.
EXPLAIN ANALYZE SELECT * FROM measurement WHERE logdate >= (select to_date('2023-10-1'));
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Append (cost=0.01..136.35 rows=2268 width=20) (actual time=0.067..0.068 rows=0 loops=1)
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.01 rows=1 width=8) (actual time=0.051..0.053 rows=1 loops=1)
-> Seq Scan on measurement_y2023q1 (cost=0.00..31.25 rows=567 width=20) (never executed)
Filter: (logdate >= $0)
-> Seq Scan on measurement_y2023q2 (cost=0.00..31.25 rows=567 width=20) (never executed)
Filter: (logdate >= $0)
-> Seq Scan on measurement_y2023q3 (cost=0.00..31.25 rows=567 width=20) (never executed)
Filter: (logdate >= $0)
-> Seq Scan on measurement_y2023q4 (cost=0.00..31.25 rows=567 width=20) (actual time=0.004..0.004 rows=0 loops=1)
Filter: (logdate >= $0)
Examples
SELECT
UPDATE and DELETE
Additional information
Constraint exclusion
Parameters
Parameter | Description |
constraint_exclusion | Specifies whether to enable constraint exclusion. Valid values:
|
If you enable constraint exclusion, the server checks the constraint defined for each partition to determine whether the partition meets a query.
When you execute a SELECT statement that does not contain a WHERE clause, the query planner recommends an execution plan that searches the entire table.
When you execute a SELECT statement that contains a WHERE clause, the query planner determines a partition to store records and sends query fragments to a specified partition. Then, the query planner prunes the partitions that cannot contain the records from an execution plan.
If you do not use partitioned tables, we recommend that you disable constraint exclusion to improve performance.
Differences between partition pruning and constraint exclusion
Partition pruning and constraint exclusion are different in the following aspects:
When the system performs a partition pruning, it takes the relationship between partitions into consideration. This is not the case for a constraint exclusion.
For example, when searching for a specific record in a list-partitioned table, in accordance with partition pruning, the system knows the partition where the record is stored. However, during a constraint exclusion, the system must check the constraint defined for each partition.
Partition pruning is performed early in the optimization stage to reduce the number of partitions that the planner needs to look at. However, constraint exclusion is performed in the later stage of optimization.