What are partitions; what are their usage scenarios and advantages - PolarDB

This topic provides an overview of the features and benefits of partitioned tables in PolarDB for PostgreSQL (Compatible with Oracle).

Introduction

By partitioning a table in PolarDB for PostgreSQL (Compatible with Oracle), you physically separate a table or index to smaller pieces, called partitions, for easier management. Each partition is an object that has its name and storage properties. To database administrators, the partitions can be managed collectively or separately, which allows for much better flexibility in their work. However, applications are not aware of the partitioning of the tables. Existing query statements and DML commands do not need to be modified for table partitioning.

Partitions of a partitioned table are the same in terms of logical properties like column names, data types, and constraints, but can have different physical properties, such as compression, configuration of physical storage, and tablespace.

Partitioning brings benefits to many types of applications, especially those that involve massive data volumes. For OLTP databases, partitioning improves manageability and availability, while for OLAP databases, the major benefits lie in performance and manageability.

Scenario

Partition a table when it is excessively large, such as when the tablespace is larger than the physical memory of the database server. This can improve database performance. For example, you may want to partition a table if it is larger than 2 GB in size.
Partition a table that stores historical data and is expected to receive new data. For example, a large table is used to store the data of a year. You can store the data of the most recent month in a separate and updatable partition, and the data of the previous months in other read-only partitions.

Benefits

Higher performance
Partitioning can significantly improve query performance, especially when the most frequently accessed rows can be limited in one or several partitions. Partitioning replaces the upper levels of an index tree. When a query involves one or a few specific partitions, the database system can perform a sequential scan on just those partitions instead of relying on indexes. This can improve performance and manageability because the system processes contiguous data chunks rather than records scattered throughout the entire table.
Easier management
As standalone objects, partitions can be managed individually or collectively. Because DDL operations can be performed on partitions, in addition to the entire table or an index, you can use partitioning to break down resource-intensive tasks, such as rebuilding indexes or tables. A single partition can be moved at a time. Therefore, when an issue occurs, only the relevant partition needs to be moved, instead of the entire table. What is more, you can perform batch operations on data records in the unit of partitions. For example, when you want to remove data from a table, you only need to use DROP TABLE to delete the relevant partition or use ALTER TABLE DETACH PARTITION to remove the partition from the parent table. VACUUM operations are not required to reclaim storage space after these operations, which is another advantage over batch deletion.
Reduced resource contention
In some OLTP databases, partitioning can reduce the contention of resources, such as when a DML statement is executed on multiple partitions.
Higher availability
When a partition becomes unavailable, the rest of the partitioned table can still be accessed. Query optimizer automatically removes unavailable partitions from query plans to avoid impact on queries.
Lower storage costs
The infrequently accessed partitions can be dumped to slower, cheaper storage medium to save cost.

Take note that only excessively large tables benefit from partitioning. We recommend that you partition a table when its size reaches the size of the physical memory of your database server.

How it works

Partitioning introduces complexity to the structure, but this is transparent to users. This section describes the characteristics and mechanism of partitioning. Understanding these can help you effectively use partitioned tables.

Example 1

CREATE TABLE measurement (
    city_id         int not null,
    logdate         date not null,
    peaktemp        int,
    unitsales       int
) PARTITION BY RANGE (logdate);
CREATE TABLE measurement_y2006m02 PARTITION OF measurement
    FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');

CREATE TABLE measurement_y2006m03 PARTITION OF measurement
    FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');

...
CREATE TABLE measurement_y2007m11 PARTITION OF measurement
    FOR VALUES FROM ('2007-11-01') TO ('2007-12-01');

CREATE TABLE measurement_y2007m12 PARTITION OF measurement
    FOR VALUES FROM ('2007-12-01') TO ('2008-01-01')
    TABLESPACE fasttablespace;

CREATE TABLE measurement_y2008m01 PARTITION OF measurement
    FOR VALUES FROM ('2008-01-01') TO ('2008-02-01')
    WITH (parallel_workers = 4)
    TABLESPACE fasttablespace;

Partition key

A partition key is a single column or a combination of multiple columns. It is used to determine the partitions the data records belong to. For a partitioned table, each row needs to be explicitly assigned to a single partition. PolarDB for PostgreSQL (Compatible with Oracle) automatically routes the operations on data records, such as insertion, update, or deletion, to the relevant partitions based on the partition key.

In example 1, logdate is the partition key of the partitioned table named measurement. The bounds of partitions of the table measurement are determined by logdate values.

Partitioning strategy

PolarDB for PostgreSQL (Compatible with Oracle) supports multiple partitioning strategies.

Range partitioning
The data is distributed based on a range of values of the partitioning key, such as dates or specific identifiers for your business. Each partition has a non-inclusive upper bound. For example, if the partition key values of a partition are from 1 to 10, and those of another are from 10 to 20, 10 belongs to the second partition, not the first. The measurement table in Example 1 is a range-partitioned table.
Interval range partitioning is a partitioning strategy extended from range partitioning. For more information, see Interval range partitioning.
List partitioning
Example 2
```
CREATE TABLE department(deptno INT4 Primary Key,dname VARCHAR(50), location VARCHAR(100)) PARTITION BY LIST (deptno);
CREATE TABLE department_p1 partition of department for values in (10, 20);
CREATE TABLE department_p1 partition of department for values in (30, 40);
```
When list-partitioning a table, you specify the list of key values for each partition. The department table in Example 2 is a list-partitioned table. The partition key values that are allowed in the partitions are both explicitly specified. For example, in the department_p1 partition, records are stored only if their deptno value is 10 or 20, and in department_p2, the values in the deptno column are all 30 or 40.
Hash partitioning
Hash partitioning is performed based on a specified modulus and remainder. Each partition stores the records where the hash value of the partition key divided by the specified modulus will produce the specified remainder.
Example 3
```
create table idxpart (i int) partition by hash (i);
create table idxpart0 partition of idxpart for values with (modulus 2, remainder 0);
create table idxpart1 partition of idxpart for values with (modulus 2, remainder 1);
```
In this example, the idxpart table is hash-partitioned. In the idxpart0 partition, the hash values of i of all records can be divided by 2 with a remainder of 0. In the idxpart1 partition, the hash values of i is divided by 2 with a remainder of 1.

Multi-level partitioning

Multi-level partitioning refers to the practice of partitioning an already partitioned table, creating subpartitions which can also be further partitioned.

PolarDB for PostgreSQL (Compatible with Oracle) supports an unlimited number of partitioning levels, but we recommend that you do not create levels beyond three. Excessive levels of partitioning may undermine the manageability of partitions and even deteriorate query performance.

You can use different partitioning strategies on different levels of partitioning. For example, you can use range partitioning for the first level, hash partitioning for the second, and list partitioning for the third.

Example 4

CREATE TABLE measurement (
    city_id         int not null,
    logdate         date not null,
    peaktemp        int,
    unitsales       int
) PARTITION BY RANGE (logdate);

CREATE TABLE measurement_y2006m03 PARTITION OF measurement
    FOR VALUES FROM ('2006-03-01') TO ('2006-04-01') PARTITION BY Hash (city_id);

CREATE TABLE measurement_y2006m03_hash1 PARTITION OF measurement_y2006m03
    for values with (modulus 2, remainder 0) PARTITION BY List (peaktemp);

CREATE TABLE measurement_y2006m03_hash1_l1 PARTITION OF measurement_y2006m03_hash1 for values in (10, 20);

Syntax

For more information about the commands for managing partitions, such as creating, adding, merging, splitting, and dropping partitions, see Command list for partitioned tables.