The definition and usage of partition pruning - PolarDB - Alibaba Cloud Documentation Center

PolarDB for PostgreSQL (Compatible with Oracle) supports the partition pruning feature, which can significantly speed up queries on partitioned tables.

Overview

When partition pruning is enabled, the query planner examines the definition of each partition and try to prove that the partition does not need to be scanned because it does not contain any rows meeting the WHERE clause. When the planner can prove this, the query planner excludes or prunes the partition from the query plan. Partition pruning significantly reduces the amount of data retrieved from disk and shortens processing time, thus improving query performance and resource utilization.

PolarDB for PostgreSQL (Compatible with Oracle) may use static or dynamic pruning based on the actual SQL statement.

Static pruning is performed during compilation, with the information of the partitions accessed beforehand. A typical scenario for static pruning is when an SQL statement contains a WHERE condition with a constant literal on the partition key column.
Dynamic pruning is performed at run-time, meaning that the exact partitions to be accessed by a statement are not known beforehand. An example scenario for dynamic pruning is when the WHERE condition uses operators or functions.

Note

Partition pruning affects the statistics of the object on which the pruning is performed and the execution plan of the statement that is executed on the object.
The system limits the data scan only to the partitions that may contain the desired records. Both static pruning and dynamic pruning improve the query performance by excluding partitions from an execution plan.
Partition pruning cannot optimize queries on subpartitioned tables or queries on tables that are partitioned by using the RANGE method based on multiple columns.

Similar to constraint exclusion, partition pruning optimizes a query that contains a WHERE clause only when the qualifiers in the WHERE clause use a specified format. For more information, see Constraint exclusion.

Parameters

Parameter

Description

polar_comp_enable_pruning

Specifies whether to enable partition pruning. Valid values:

on (default): Partition pruning is enabled to quickly prune specific queries.
off: Partition pruning is disabled.

Usage notes

By partition type

Queries for list-partitioned tables

For list-partitioned tables, partition pruning can quickly prune the queries that contain WHERE clauses. An operator in a WHERE clause can be equal to (=), IS NULL, or IS NOT NULL.
Example:
```
------ Specify a list-partitioned table.
CREATE TABLE sales_hist(..., country text, ...) 
    PARTITION BY LIST(country) (
    PARTITION americas VALUES('US', 'CA', 'MX'), 
    PARTITION europe VALUES('BE', 'NL', 'FR'), 
    PARTITION asia VALUES('JP', 'PK', 'CN'), 
    PARTITION others VALUES(DEFAULT)
) 
------ Extract the information that contains the following WHERE clauses from the list-partitioned table.
WHERE country = 'US' WHERE country IS NULL;          
```
If the first WHERE clause is used, partition pruning excludes the europe, asia, and others partitions because the partitions do not meet WHERE country = 'US'.
If the second WHERE clause is used, partition pruning excludes the americas, europe, and asia partitions because these partitions do not meet WHERE country IS NULL.

Queries for range-partitioned tables

For range-partitioned tables, partition pruning can quickly prune the queries that contain WHERE clauses. An operator in a WHERE clause can be equal to (=), greater than (>), greater than or equal to (>=), less than (<), less than or equal to (<=), IS NULL, IS NOT NULL, or an expression consisting of BETWEEN and AND. Examples:

WHERE size > 100 AND size <= 200 WHERE size BETWEEN 100 AND 200

Note

Partition pruning does not support expressions that contain the OR or IN operator.

Example:

------ Specify a range-partitioned table.
CREATE TABLE boxes(id int, size int, color text) 
  PARTITION BY RANGE(size)
(
    PARTITION small VALUES LESS THAN(100),
    PARTITION medium VALUES LESS THAN(200),
    PARTITION large VALUES LESS THAN(300)
)
------ Extract the information that contains the following WHERE clauses from the range-partitioned table. 
WHERE size > 100-- Scan the medium and large partitions.
WHERE size >= 100-- Scan the medium and large partitions.
WHERE size =100-- Scan the medium partition.
WHERE size <= 100-- Scan the small and medium partitions.
WHERE size < 100-- Scan the small partition.
WHERE size > 100 AND size < 199-- Scan the medium partition.
WHERE size BETWEEN 100 AND 199-- Scan the medium partition.
WHERE color = 'red' AND size = 100  -- Scan the medium partition.
WHERE color = 'red' AND (size > 100 AND size < 199) -- Scan the medium partition.

By stage

In PolarDB for PostgreSQL (Compatible with Oracle), conditional expressions are categorized into those that are immutable, stable, and volatile. The three types of conditional expressions correspond to three types of pruning.

If a conditional expression is immutable, such as a constant literal, partition pruning is performed in the optimizer stage.
If a conditional expression is stable, such as an expression that contains now(), partition pruning is performed in the early executor stage.
If a conditional expression is volatile, such as an expression that contains random(), partition pruning is performed in the later executor stage.

Partition pruning in the optimizer stage

The following example shows how partition pruning is performed in the optimizer stage.

In this example, a table named measurement that contains the following four partitions is created by using the logdate field as the partition key: measurement_y2023q1,measurement_y2023q2, measurement_y2023q3, measurement_y2023q4. Each partition corresponds to data in a quarter of 2023.

CREATE TABLE measurement(
    city_id         int not null,
    logdate         date not null,
    peaktemp        int,
    unitsales       int
) PARTITION BY RANGE (logdate);
CREATE TABLE measurement_y2023q1 PARTITION OF measurement
    FOR VALUES FROM ('2023-01-01') TO ('2023-04-01');
CREATE TABLE measurement_y2023q2 PARTITION OF measurement
    FOR VALUES FROM ('2023-04-01') TO ('2023-07-01');
CREATE TABLE measurement_y2023q3 PARTITION OF measurement
    FOR VALUES FROM ('2023-07-01') TO ('2023-10-01');
CREATE TABLE measurement_y2023q4 PARTITION OF measurement
    FOR VALUES FROM ('2023-10-01') TO ('2024-04-01');

Query the measurement table based on the following condition: logdate >= DATE '2023-10-01'. As shown in the results of the EXPLAIN command, the partitions for the first, second, and third quarters are pruned by default. The system does not query data in the three partitions because their data does not meet the condition logdate >= DATE '2023-10-01'. In this example, partition pruning is performed in the optimizer stage. The filter condition is specified based on the partition key logdate, and DATE '2023-10-01' used in the conditional expression is constant. The conditional expression can be calculated when the optimizer is initialized.

EXPLAIN SELECT * FROM measurement WHERE logdate >= DATE '2023-10-01';
                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Append  (cost=0.00..34.09 rows=567 width=20)
   ->  Seq Scan on measurement_y2023q4  (cost=0.00..31.25 rows=567 width=20)
         Filter: (logdate >= '01-OCT-23 00:00:00'::date)
(3 rows)

Partition pruning in the early executor stage

The following example uses the measurement table created in the first example to describe how partition pruning is performed in the early executor stage.

The following example uses the same table measurement and performs the same query as the first example. However, the conditional expression used in the following example changes from a constant expression to a stable expression now(). The new conditional expression cannot be calculated in the optimizer stage, but can be calculated in the early executor stage. If the month is July 2023, the two partitions for the first and second quarters are removed (Subplans Removed: 2). Only the partitions for the third and fourth quarters need to be scanned. In this example, partition pruning is performed in the early executor stage. The filter condition is specified based on the partition key logdate, and the conditional expression now() is stable. The conditional expression can be calculated in this stage.

EXPLAIN SELECT * FROM measurement WHERE logdate >= now();
                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Append  (cost=0.00..153.34 rows=2268 width=20)
   Subplans Removed: 2
   ->  Seq Scan on measurement_y2023q3  (cost=0.00..35.50 rows=567 width=20)
         Filter: (logdate >= now())
   ->  Seq Scan on measurement_y2023q4  (cost=0.00..35.50 rows=567 width=20)
         Filter: (logdate >= now())
(6 rows)

Partition pruning in the later executor stage

The following example uses the measurement table created in the first example to describe how partition pruning is performed in the later executor stage.

The following example uses the same table and performs the same query as the first example. However, the conditional expression used in the following example changes from a constant expression to a volatile subjoin (select to_date('2023-10-1')). The new conditional expression cannot be calculated in the early executor stage, but can only be calculated in the later executor stage.

As shown in the results of the EXPLAIN ANALYZE command, the partitions for the first three quarters are tagged (never executed), which means the partitions are pruned. In this example, partition pruning is performed in the later executor stage. This type of partition pruning is applicable to volatile expressions, subqueries, subjoins, and expressions of join conditions. The filter condition is specified based on the partition key logdate, and the conditional expression (select to_date('2023-10-1')) is a volatile subjoin. The conditional expression can be calculated in this stage.

EXPLAIN ANALYZE SELECT * FROM measurement WHERE logdate >= (select  to_date('2023-10-1'));
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.01..136.35 rows=2268 width=20) (actual time=0.067..0.068 rows=0 loops=1)
   InitPlan 1 (returns $0)
     ->  Result  (cost=0.00..0.01 rows=1 width=8) (actual time=0.051..0.053 rows=1 loops=1)
   ->  Seq Scan on measurement_y2023q1  (cost=0.00..31.25 rows=567 width=20) (never executed)
         Filter: (logdate >= $0)
   ->  Seq Scan on measurement_y2023q2  (cost=0.00..31.25 rows=567 width=20) (never executed)
         Filter: (logdate >= $0)
   ->  Seq Scan on measurement_y2023q3  (cost=0.00..31.25 rows=567 width=20) (never executed)
         Filter: (logdate >= $0)
   ->  Seq Scan on measurement_y2023q4  (cost=0.00..31.25 rows=567 width=20) (actual time=0.004..0.004 rows=0 loops=1)
         Filter: (logdate >= $0)

Examples

SELECT

CREATE TABLE sales
(
  dept_no     number,   
  part_no     varchar2,
  country     varchar2(20),
  date        date,
  amount      number
)
PARTITION BY LIST(country)
(
  PARTITION europe VALUES('FRANCE', 'ITALY'),
  PARTITION asia VALUES('INDIA', 'PAKISTAN'),
  PARTITION americas VALUES('US', 'CANADA')
);

------Perform a constrained query that contains the EXPLAIN statement. In this case, the server scans only the sales_asia table and stores records whose country value in the table is INDIA.
EXPLAIN (COSTS OFF) SELECT * FROM sales WHERE country = 'INDIA';
                    QUERY PLAN                     
---------------------------------------------------
 Append
   ->  Seq Scan on sales
         Filter: ((country)::text = 'INDIA'::text)
   ->  Seq Scan on sales_asia
         Filter: ((country)::text = 'INDIA'::text)
(5 rows)

------Perform the following query to search records that are included in the partition key. In this case, the server scans only the sales_asia table and stores records whose country value in the table is INDIA.
EXPLAIN (COSTS OFF) SELECT * FROM sales WHERE dept_no = '30'; 
               QUERY PLAN               
-----------------------------------------
 Append
   ->  Seq Scan on sales
         Filter: (dept_no = 30::numeric)
   ->  Seq Scan on sales_europe
         Filter: (dept_no = 30::numeric)
   ->  Seq Scan on sales_asia
         Filter: (dept_no = 30::numeric)
   ->  Seq Scan on sales_americas
         Filter: (dept_no = 30::numeric)
(9 rows)

Constraint exclusion can also be used for queries on subpartitioned tables. The following statement provides an example on how to create a subpartitioned table:

CREATE TABLE sales
(
  dept_no     number,
  part_no     varchar2,
  country     varchar2(20),
  date        date,
  amount      number
)
PARTITION BY RANGE(date) SUBPARTITION BY LIST (country)
(
  PARTITION "2011" VALUES LESS THAN('01-JAN-2012')
  (
    SUBPARTITION europe_2011 VALUES ('ITALY', 'FRANCE'),
    SUBPARTITION asia_2011 VALUES ('PAKISTAN', 'INDIA'),
    SUBPARTITION americas_2011 VALUES ('US', 'CANADA')
  ),
  PARTITION "2012" VALUES LESS THAN('01-JAN-2013')
  (
    SUBPARTITION europe_2012 VALUES ('ITALY', 'FRANCE'),
    SUBPARTITION asia_2012 VALUES ('PAKISTAN', 'INDIA'),
    SUBPARTITION americas_2012 VALUES ('US', 'CANADA')
  ),
  PARTITION "2013" VALUES LESS THAN('01-JAN-2014')
  (
    SUBPARTITION europe_2013 VALUES ('ITALY', 'FRANCE'),
    SUBPARTITION asia_2013 VALUES ('PAKISTAN', 'INDIA'),
    SUBPARTITION americas_2013 VALUES ('US', 'CANADA')
  )
);

------ When you query the sales table, the query planner prunes all partitions or subpartitions that may not include desired result sets from the search path.
EXPLAIN (COSTS OFF) SELECT * FROM sales WHERE country = 'US' AND date = 'Dec 12, 2012';
                                 QUERY PLAN                                 
-----------------------------------------------------------------------------
 Append
   ->  Seq Scan on sales
         Filter: (((country)::text = 'US'::text) AND (date = '12-DEC-12 00:00:00'::timestamp without time zone))
   ->  Seq Scan on sales_2012
         Filter: (((country)::text = 'US'::text) AND (date = '12-DEC-12 00:00:00'::timestamp without time zone))
   ->  Seq Scan on sales_americas_2012
         Filter: (((country)::text = 'US'::text) AND (date = '12-DEC-12 00:00:00'::timestamp without time zone))
(7 rows)

UPDATE and DELETE

CREATE TABLE t1_hash (id int , value int) PARTITION BY hash(id) partitions 4;

------ UPDATE operation
EXPLAIN UPDATE t1_hash SET value = value+1 WHERE id = least(1,2);
                               QUERY PLAN                                
-------------------------------------------------------------------------
 Update on t1_hash  (cost=0.00..92.18 rows=24 width=14)
   Update on t1_hash_p1
   Update on t1_hash_p2 t1_hash
   ->  Append  (cost=0.00..92.18 rows=24 width=14)
         Subplans Removed: 1
         ->  Seq Scan on t1_hash_p1  (cost=0.00..46.03 rows=12 width=14)
               Filter: (id = LEAST(1, 2))
(7 rows)

------ DELETE operation
EXPLAIN DELETE FROM t1_hash WHERE id = least(1,2);
                               QUERY PLAN                                
-------------------------------------------------------------------------
 Delete on t1_hash  (cost=0.00..92.12 rows=24 width=10)
   Delete on t1_hash_p1
   Delete on t1_hash_p2 t1_hash
   ->  Append  (cost=0.00..92.12 rows=24 width=10)
         Subplans Removed: 1
         ->  Seq Scan on t1_hash_p1  (cost=0.00..46.00 rows=12 width=10)
               Filter: (id = LEAST(1, 2))
(7 rows)

Additional information

Constraint exclusion

Parameters

Parameter

Description

constraint_exclusion

Specifies whether to enable constraint exclusion. Valid values:

partition (default): Constraint exclusion is enabled.
on: Constraint exclusion is enabled.
off: Constraint exclusion is disabled.

If you enable constraint exclusion, the server checks the constraint defined for each partition to determine whether the partition meets a query.

When you execute a SELECT statement that does not contain a WHERE clause, the query planner recommends an execution plan that searches the entire table.
When you execute a SELECT statement that contains a WHERE clause, the query planner determines a partition to store records and sends query fragments to a specified partition. Then, the query planner prunes the partitions that cannot contain the records from an execution plan.

If you do not use partitioned tables, we recommend that you disable constraint exclusion to improve performance.

Differences between partition pruning and constraint exclusion

Partition pruning and constraint exclusion are different in the following aspects:

When the system performs a partition pruning, it takes the relationship between partitions into consideration. This is not the case for a constraint exclusion.
For example, when searching for a specific record in a list-partitioned table, in accordance with partition pruning, the system knows the partition where the record is stored. However, during a constraint exclusion, the system must check the constraint defined for each partition.
Partition pruning is performed early in the optimization stage to reduce the number of partitions that the planner needs to look at. However, constraint exclusion is performed in the later stage of optimization.