All Products
Search
Document Center

PolarDB:Overview

Last Updated:Apr 18, 2024

This topic provides an overview of the features and benefits of partitioned tables in PolarDB for PostgreSQL (Compatible with Oracle).

Introduction

By partitioning a table in , you physically separate a table or index to smaller tables, named partitions, for easier management. Each partition is an object that has their name and storage properties. To database administrators, the partitions can be managed collectively or separately, which allows for much better flexibility in their work. However, applications are not aware of the partitioning of the tables. Existing query statements and DML commands do not need to be modified for table partitioning.

Partitions of a partitioned table are the same in terms of logical properties like column names, data types, and constraints, but can have different physical properties, such as compression, configuration of physical storage, and tablespace.

Partitioning brings benefits to many types of applications, especially those that involve massive data volumes. For OLTP databases, partitioning improves manageability and availability, while for OLAP databases, the major benefits lie in performance and manageability.

Scenarios

  • Partition a table when it is excessively large, such as when the table space is larger than the physical memory of the database server. This can improve database performance. For example, you may want to partition a table if it is larger than 2 GB in size.

  • Partition a large table when it stores historical data, and new data needs to be written into the latest partition. For example, a large table is used to store the historical data of a year. You can store the data of the most recent month in a separate partition where data is updatable. You can then store the data of the previous months in other read-only partitions.

Benefits

  • Higher performance

    Partitioning may significantly improve query performance, especially when the most frequently accessed rows can be limited in one or several partitions. Partitions can be used to replace the upper trees of indexes. When a query involves one or a few specific partitions, the database system can perform a sequential scan on just those partitions instead of relying on indexes. This can improve performance and manageability because the system processes contiguous data chunks rather than records scattered throughout the entire table.

  • Easier management

    As standalone objects, partitions can be managed individually or collectively, and DDL operations can be performed on partitions, instead of the entire table or index. Therefore, you can break down resource-intensive tasks, such as rebuilding indexes or tables, so that only a single partition is moved at a time. When an issue occurs, only the related partition needs to be moved, instead of the entire table. What's more, you can perform batch operations on data records in the unit of partitions. For example, when you want to remove data from a table, you only need to use DROP TABLE to delete the partition or use ALTER TABLE DETACH PARTITION to remove the partition from the parent table. VACUUM operations are not required to reclaim storage space after these operations, which is another advantage over batch deletion.

  • Reduced resource contention

    In some OLTP databases, partitioning can reduce the contention of resources, such as when a DML statement is executed on multiple partitions.

  • Higher availability

    When a partition becomes unavailable, the rest of the partitioned table can still be accessed. Query optimizer automatically removes unavailable partitions from query plans to avoid impact on queries.

  • Lower storage costs

    The infrequently accessed partitions can be dumped to slower, cheaper storage medium to save cost.

Take note that the preceding benefits are valid only when the partitioned table is large in size. We recommend that you partition a table when its size reaches the size of the physical memory of your database server.

image

How it works

Partitioned tables are more complex than non-partitioned tables in structure, but their structures are transparent to applications. This section introduces the features and mechanism of partitioning to facilitate the use of partition tables.

Example 1

CREATE TABLE measurement (
    city_id         int not null,
    logdate         date not null,
    peaktemp        int,
    unitsales       int
) PARTITION BY RANGE (logdate);
CREATE TABLE measurement_y2006m02 PARTITION OF measurement
    FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');

CREATE TABLE measurement_y2006m03 PARTITION OF measurement
    FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');

...
CREATE TABLE measurement_y2007m11 PARTITION OF measurement
    FOR VALUES FROM ('2007-11-01') TO ('2007-12-01');

CREATE TABLE measurement_y2007m12 PARTITION OF measurement
    FOR VALUES FROM ('2007-12-01') TO ('2008-01-01')
    TABLESPACE fasttablespace;

CREATE TABLE measurement_y2008m01 PARTITION OF measurement
    FOR VALUES FROM ('2008-01-01') TO ('2008-02-01')
    WITH (parallel_workers = 4)
    TABLESPACE fasttablespace;

Partition key

A partition key is a single column or a combination of multiple columns. It is used to determine the partitions the data records belong to. For a partitioned table, each row needs to be explicitly assigned to a single partition. PolarDB for PostgreSQL (Compatible with Oracle) automatically routes the operations on data records, such as insertion, update, or deletion, to the relevant partitions based on the partition key.

In example 1, logdate is the partition key of the partitioned table named measurement. The bounds of partitions of the table measurement are determined by logdate values.

Partitioning strategy

PolarDB for PostgreSQL (Compatible with Oracle) supports multiple partitioning strategies.

  • Range partitioning

    The data is distributed based on a range of values of the partitioning key, such as dates or specific identifiers for your business. Each partition has a non-inclusive upper bound. For example, if the partition key values of a partition are from 1 to 10, and those of another are from 10 to 20, 10 belongs to the second partition, not the first. The measurement table in Example 1 is a range-partitioned table.

    Interval range partitioning is a partitioning strategy extended from range partitioning. For more information, see Interval range partitioning.

  • List partitioning

    Example 2

    CREATE TABLE department(deptno INT4 Primary Key,dname VARCHAR(50), location VARCHAR(100)) PARTITION BY LIST (deptno);
    CREATE TABLE department_p1 partition of department for values in (10, 20);
    CREATE TABLE department_p1 partition of department for values in (30, 40);

    When list-partitioning a table, you specify the key-value pair for the partition key of each partition. The department table in Example 2 is a list-partitioned table. The partition key values that are allowed in the partitions are both explicitly specified. For example, in the department_p1 partition, records are stored only if their deptno value is 10 or 20, and in department_p2, the values in the deptno column are all 30 or 40.

  • Hash partitioning

    Hash partitioning is performed based on division and remainders.

    Example 3

    create table idxpart (i int) partition by hash (i);
    create table idxpart0 partition of idxpart for values with (modulus 2, remainder 0);
    create table idxpart1 partition of idxpart for values with (modulus 2, remainder 1);

    In this example, the idxpart table is hash-partitioned. For example, in the idxpart0 partition, the i values of all records can be divided by 2 with a remainder 0. In the idxpart1 partition, the i values of all records can be divided by 2 with a remainder 1.

Multi-level partitioning

Multi-level partitioning refers to the practice of partitioning an already partitioned table, creating subpartitions which can also be further partitioned.

PolarDB for PostgreSQL (Compatible with Oracle) supports an unlimited number of partitioning levels, but we recommend that you do not create levels beyond three. Excessive levels of partitioning may undermine the manageability of partitions and even deteriorate query performance.

You can use different partitioning strategies on different levels of partitioning. For example, you can use range partitioning for the first level, hash partitioning for the second, and list partitioning for the third.

Example 4

CREATE TABLE measurement (
    city_id         int not null,
    logdate         date not null,
    peaktemp        int,
    unitsales       int
) PARTITION BY RANGE (logdate);

CREATE TABLE measurement_y2006m03 PARTITION OF measurement
    FOR VALUES FROM ('2006-03-01') TO ('2006-04-01') PARTITION BY Hash (city_id);

CREATE TABLE measurement_y2006m03_hash1 PARTITION OF measurement_y2006m03
    for values with (modulus 2, remainder 0) PARTITION BY List (peaktemp);

CREATE TABLE measurement_y2006m03_hash1_l1 PARTITION OF measurement_y2006m03_hash1 for values in (10, 20);

Syntax

For more information about the commands for managing partitions, such as creating, adding, merging, splitting, and deleting partitions, see Command list for partitioned tables.