A large table can be divided into smaller storage units called partitions. After you specify query criteria, only the data in partitions that meet the criteria is scanned. This improves query performance.
Table partitioning types supported
- Range partitioning: Data is divided based on a numerical range, such as date.
- List partitioning: Data is divided based on a list of values, such as city attributes.
- Multi-level partitioning: Data is divided based on both a numerical range and a list of values.
The preceding figure shows an example of a multi-level partitioned table. Data in level-1 partitions is divided by month, and that in level-2 partitions is divided based on the values of regions.
Range partitioned table
You can have AnalyticDB for PostgreSQL automatically generate partitions by giving a START value, an END value, and an EVERY clause that defines the partition increment value. By default, START values are always inclusive and END values are always exclusive. Example:
CREATE TABLE sales (id int, date date, amt decimal(10,2)) DISTRIBUTED BY (id) PARTITION BY RANGE (date) ( START (date '2016-01-01') INCLUSIVE END (date '2017-01-01') EXCLUSIVE EVERY (INTERVAL '1 day') );
You can create a range partitioned table that uses a single numeric data type column as the partition key. Example:
CREATE TABLE rank (id int, rank int, year int, gender char(1), count int) DISTRIBUTED BY (id) PARTITION BY RANGE (year) ( START (2006) END (2016) EVERY (1), DEFAULT PARTITION extra );
List partitioned table
A list partitioned table can use any data type column that allows equality comparisons as its partition key. For list partitions, you must declare a partition specification for every partition (list value) you want to create. Example:
CREATE TABLE rank (id int, rank int, year int, gender char(1), count int ) DISTRIBUTED BY (id) PARTITION BY LIST (gender) ( PARTITION girls VALUES ('F'), PARTITION boys VALUES ('M'), DEFAULT PARTITION other );
Multi-level partitioned table
CREATE TABLE sales (id int, year int, month int, day int, region text) DISTRIBUTED BY (id) PARTITION BY RANGE (year) SUBPARTITION BY RANGE (month) SUBPARTITION TEMPLATE ( START (1) END (13) EVERY (1), DEFAULT SUBPARTITION other_months ) SUBPARTITION BY LIST (region) SUBPARTITION TEMPLATE ( SUBPARTITION usa VALUES ('usa'), SUBPARTITION europe VALUES ('europe'), SUBPARTITION asia VALUES ('asia'), DEFAULT SUBPARTITION other_regions) ( START (2008) END (2016) EVERY (1), DEFAULT PARTITION outlying_years);
Granularity of table partitioning
For a table partitioned by time, the granularity may be a day, week, or month. A finer granularity results in a smaller amount of data in each partition but a larger number of partitions. There are no absolute standards regarding the number of partitions. In most cases, the number of partitions is about 200. This number is relatively large. A large number of partitions has many impacts. For example, it slows operations of the query optimizer and many maintenance operations (such as VACUUM).
Optimization of queries on partitioned tables
explain select * from sales where year = 2008 and month = 1 and day = 3 and region = 'usa';
The query criteria fall on the level-3 subpartition 'usa' of the level-2 subpartition 1 of level-1 partition 2008. Therefore, only the data in the level-3 subpartition 'usa' is scanned during query. As shown in the following query plan, only one of the 468 level-3 subpartitions is read.
Gather Motion 4:1 (slice1; segments: 4) (cost=0.00..431.00 rows=1 width=24) -> Sequence (cost=0.00..431.00 rows=1 width=24) -> Partition Selector for sales (dynamic scan id: 1) (cost=10.00..100.00 rows=25 width=4) Filter: year = 2008 AND month = 1 AND region = 'usa'::text Partitions selected: 1 (out of 468) -> Dynamic Table Scan on sales (dynamic scan id: 1) (cost=0.00..431.00 rows=1 width=24) Filter: year = 2008 AND month = 1 AND day = 3 AND region = 'usa'::text
Partition definition query
SELECT partitionboundary, partitiontablename, partitionname, partitionlevel, partitionrank FROM pg_partitions WHERE tablename='sales';
AnalyticDB for PostgreSQL supports diverse partition-related operations such as adding, dropping, clearing, and splitting. For more information, visit Greenplum documentation.