All Products
Search
Document Center

PolarDB:Partition pruning

Last Updated:Apr 17, 2024

PolarDB for PostgreSQL (Compatible with Oracle) supports the partition pruning feature, which can significantly speed up queries on partitioned tables.

Overview

With partition pruning enabled, the query planner examines the definition of each partition and try to prove that the partition need not be scanned because it could not contain any rows meeting the query's WHERE clause. When the planner can prove this, it excludes, or prunes, the partition from the query plan. Partition pruning dramatically reduces the amount of data retrieved from disk and shortens processing time, thus improving query performance and resource utilization.

Depending upon the actual SQL statement, PolarDB for PostgreSQL (Compatible with Oracle) may use static or dynamic pruning.

  • Static pruning happens during compilation, with the information about the partitions accessed beforehand. A typical scenario for static pruning is when an SQL statement contains a WHERE condition with a constant literal on the partition key column.

  • Dynamic pruning happens at run-time, meaning that the exact partitions to be accessed by a statement are not known beforehand. An example scenario for dynamic pruning is when the WHERE condition uses operators or functions.

Partition pruning affects the statistics of the objects where pruning occurs and also affects the execution plan of a statement.

The system will limit the data scan only to the partitions that could contain the desired records. Both dynamic and static pruning will omit the unneeded partitions in the execution plans, which can improve performance.

Partition pruning and constraint exclusion

Partition pruning and constraint exclusion are different in the following aspects:

  • When the system performs a partition pruning, it takes the relationship between partitions into consideration. This is not the case for a constraint exclusion.

    For example, when looking for a specific record in a list-partitioned table, in accordance with partition pruning, the system knows the partition where the record is stored. However, during a constraint exclusion, the system must check the constraint defined for each partition.

  • Partition pruning happens early in the optimization stage to reduce the number of partitions that the planner needs to look at. However, constraint-based exclusion takes place in the later stage of optimization.

Partition pruning in difference stages

In PolarDB for PostgreSQL (Compatible with Oracle), conditions are categorized into those that are immutable, stable, and volatile. For each of the category, there is a corresponding partition pruning method.

  • For an immutable condition, such as a constant literal, the system performs partition pruning in the optimizer stage.

  • For a stable condition, such as a now() function, partition pruning takes place in the early executor stage.

  • For a random condition, such as a random() function, partition pruning is performed in the later executor stage.

Partition pruning in the optimizer stage

In this example, a table measurement that contains the following four partitions is created using the logdate field as the partition key: measurement_y2023q1,measurement_y2023q2, measurement_y2023q3, measurement_y2023q4. Each of the partition corresponds to the data in a quarter of 2023.

CREATE TABLE measurement(
    city_id         int not null,
    logdate         date not null,
    peaktemp        int,
    unitsales       int
) PARTITION BY RANGE (logdate);
CREATE TABLE measurement_y2023q1 PARTITION OF measurement
    FOR VALUES FROM ('2023-01-01') TO ('2023-04-01');
CREATE TABLE measurement_y2023q2 PARTITION OF measurement
    FOR VALUES FROM ('2023-04-01') TO ('2023-07-01');
CREATE TABLE measurement_y2023q3 PARTITION OF measurement
    FOR VALUES FROM ('2023-07-01') TO ('2023-10-01');
CREATE TABLE measurement_y2023q4 PARTITION OF measurement
    FOR VALUES FROM ('2023-10-01') TO ('2024-04-01');

A query is executed to obtain all records whose logdate value is larger than or equal to 2023-10-01. In the execution plan returned from an EXPLAIN command, the records for the first three quarters are all pruned, for their logdate values do not meet the logdate >= DATE '2023-10-01' condition. Thus, these records are not scanned during the execution. This partition pruning is performed during the optimizer stage, because the condition on the logdate partition key is a constant literal DATE '2023-10-01', which can be calculated in this stage.

EXPLAIN SELECT * FROM measurement WHERE logdate >= DATE '2023-10-01';
                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Append  (cost=0.00..34.09 rows=567 width=20)
   ->  Seq Scan on measurement_y2023q4  (cost=0.00..31.25 rows=567 width=20)
         Filter: (logdate >= '01-OCT-23 00:00:00'::date)
(3 rows)

Partition pruning in the early executor stage

In this example, the table and query are the same, but the condition is changed from a constant value to a now() expression. The expression cannot be calculated by the optimizer, but can be calculated early into the execution stage. Suppose that the query is executed on a day of July, 2023, the execution plan shows that the partitions for the first two quarters are pruned (indicated by Subplans Removed: 2). Only the measurement_y2023q3 and measurement_y2023q4 partitions are scanned. This partition pruning is performed in the early executor stage, because the condition on the logdate partition key is a stable expression now(), which can be calculated in this stage.

EXPLAIN SELECT * FROM measurement WHERE logdate >= now();
                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Append  (cost=0.00..153.34 rows=2268 width=20)
   Subplans Removed: 2
   ->  Seq Scan on measurement_y2023q3  (cost=0.00..35.50 rows=567 width=20)
         Filter: (logdate >= now())
   ->  Seq Scan on measurement_y2023q4  (cost=0.00..35.50 rows=567 width=20)
         Filter: (logdate >= now())
(6 rows)

Partition pruning in the later executor stage

Again, the table and the query are the same, but this time, the condition is a volatile expression (select to_date('2023-10-1')). It can only be calculated in the later executor stage.

As is shown in the execution plan, the partitions for the first three quarters are tagged (never executed), which means they are pruned. This pruning mode occurs for volatile expressions, subqueries, subjoins, and join conditions.

EXPLAIN ANALYZE SELECT * FROM measurement WHERE logdate >= (select  to_date('2023-10-1'));
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.01..136.35 rows=2268 width=20) (actual time=0.067..0.068 rows=0 loops=1)
   InitPlan 1 (returns $0)
     ->  Result  (cost=0.00..0.01 rows=1 width=8) (actual time=0.051..0.053 rows=1 loops=1)
   ->  Seq Scan on measurement_y2023q1  (cost=0.00..31.25 rows=567 width=20) (never executed)
         Filter: (logdate >= $0)
   ->  Seq Scan on measurement_y2023q2  (cost=0.00..31.25 rows=567 width=20) (never executed)
         Filter: (logdate >= $0)
   ->  Seq Scan on measurement_y2023q3  (cost=0.00..31.25 rows=567 width=20) (never executed)
         Filter: (logdate >= $0)
   ->  Seq Scan on measurement_y2023q4  (cost=0.00..31.25 rows=567 width=20) (actual time=0.004..0.004 rows=0 loops=1)
         Filter: (logdate >= $0)