A large table can be divided into smaller storage units called partitions. After you specify query criteria, only the data in partitions that meet the criteria is scanned. This improves query performance.

Table partitioning types supported

  • Range partitioning: Data is divided based on a numerical range, such as date.
  • List partitioning: Data is divided based on a list of values, such as city attributes.
  • Multi-level partitioning: Data is divided based on both a numerical range and a list of values.

The preceding figure shows an example of a multi-level partitioned table. Data in level-1 partitions is divided by month, and that in level-2 partitions is divided based on the values of regions.

Range partitioned table

You can have AnalyticDB for PostgreSQL automatically generate partitions by giving a START value, an END value, and an EVERY clause that defines the partition increment value. By default, START values are always inclusive and END values are always exclusive. Example:

CREATE TABLE sales (id int, date date, amt decimal(10,2))
DISTRIBUTED BY (id)
PARTITION BY RANGE (date)
( START (date '2016-01-01') INCLUSIVE
   END (date '2017-01-01') EXCLUSIVE
   EVERY (INTERVAL '1 day') );

You can create a range partitioned table that uses a single numeric data type column as the partition key. Example:

CREATE TABLE rank (id int, rank int, year int, gender char(1), count int)
DISTRIBUTED BY (id)
PARTITION BY RANGE (year)
( START (2006) END (2016) EVERY (1), 
  DEFAULT PARTITION extra ); 

List partitioned table

A list partitioned table can use any data type column that allows equality comparisons as its partition key. For list partitions, you must declare a partition specification for every partition (list value) you want to create. Example:

CREATE TABLE rank (id int, rank int, year int, gender 
char(1), count int ) 
DISTRIBUTED BY (id)
PARTITION BY LIST (gender)
( PARTITION girls VALUES ('F'), 
  PARTITION boys VALUES ('M'), 
  DEFAULT PARTITION other );

Multi-level partitioned table

You can create a table with multi-level partitions. The following example shows a three-level partition design where the sales table is partitioned by year, month, and then region.
CREATE TABLE sales (id int, year int, month int, day int, 
region text)
DISTRIBUTED BY (id)
PARTITION BY RANGE (year)            
  SUBPARTITION BY RANGE (month)     
  SUBPARTITION TEMPLATE (
       START (1) END (13) EVERY (1), 
       DEFAULT SUBPARTITION other_months )
  SUBPARTITION BY LIST (region)    
    SUBPARTITION TEMPLATE (
       SUBPARTITION usa VALUES ('usa'),
       SUBPARTITION europe VALUES ('europe'),
       SUBPARTITION asia VALUES ('asia'),
       DEFAULT SUBPARTITION other_regions)
( START (2008) END (2016) EVERY (1),
  DEFAULT PARTITION outlying_years);

Granularity of table partitioning

For a table partitioned by time, the granularity may be a day, week, or month. A finer granularity results in a smaller amount of data in each partition but a larger number of partitions. There are no absolute standards regarding the number of partitions. In most cases, the number of partitions is about 200. This number is relatively large. A large number of partitions has many impacts. For example, it slows operations of the query optimizer and many maintenance operations (such as VACUUM).

Optimization of queries on partitioned tables

AnalyticDB for PostgreSQL supports partition truncating for partitioned tables. Only required partitions are scanned based on query criteria, which improves query performance. The following query is used as an example:
explain 
  select * from sales 
  where year = 2008 
    and  month = 1 
    and  day = 3 
    and region = 'usa';

The query criteria fall on the level-3 subpartition 'usa' of the level-2 subpartition 1 of level-1 partition 2008. Therefore, only the data in the level-3 subpartition 'usa' is scanned during query. As shown in the following query plan, only one of the 468 level-3 subpartitions is read.

Gather Motion 4:1  (slice1; segments: 4)  (cost=0.00..431.00 rows=1 width=24)
  ->  Sequence  (cost=0.00..431.00 rows=1 width=24)
        ->  Partition Selector for sales (dynamic scan id: 1)  (cost=10.00..100.00 rows=25 width=4)
              Filter: year = 2008 AND month = 1 AND region = 'usa'::text
              Partitions selected:  1 (out of 468)
        ->  Dynamic Table Scan on sales (dynamic scan id: 1)  (cost=0.00..431.00 rows=1 width=24)
              Filter: year = 2008 AND month = 1 AND day = 3 AND region = 'usa'::text

Partition definition query

You can use the following SQL statement to query the definitions of all partitions in a table:
SELECT 
  partitionboundary, 
  partitiontablename, 
  partitionname,
  partitionlevel, 
  partitionrank
FROM pg_partitions 
WHERE tablename='sales';

References

AnalyticDB for PostgreSQL supports diverse partition-related operations such as adding, dropping, clearing, and splitting. For more information, visit Greenplum documentation.