HASH partitioning - PolarDB - Alibaba Cloud Documentation Center

This topic describes the HASH partitioning policy for PolarDB-X databases in automatic partitioning mode.

Syntax

In PolarDB-X, data is distributed to different partitions based on the consistent hashing algorithm. The HASH partitioning policy allows you to use the expressions that contain partitioning functions to process the values of partition key columns.

The routing algorithm for HASH partitioning in PolarDB-X is different from that in MySQL.

CREATE TABLE ... 
PARTITION BY HASH(partition_expr) 
PARTITIONS number;

partition_expr:
    partition_column_list	
  | partition_func(partition_column)

partition_column_list:
  partition_column_list[, partition_column, partition_column, ...]
  
# Define partitioning functions.
partition_func:
     
  | TO_DAYS
  | TO_MONTHS
  | TO_WEEKS
  | TO_SECOND
  | UNIX_TIMESTAMP
  | MONTH
  | DAYOFWEEK
  | DAYOFMONTH
  | DAYOFYEAR
  | SUBSTR
  | SUBSTRING
  | RIGHT
  | LEFT

Note

For more information about the differences between HASH partitioning and KEY partitioning, see the Comparison between KEY partitioning and HASH partitioning table of the "Overview" topic.

Usage notes

HASH partitioning allows you to use partitioning functions for single-column partition keys. However, the supported data types of partition key columns vary based on the partitioning function.
HASH partitioning does not allow you to use partitioning functions for vector partition keys and does not support hot data partition splitting.
You cannot use nested partitioning functions for partition key columns.
By default, a partitioned table can contain up to 8,192 partitions.
By default, a partition key can consist of up to five partition key columns.
If you use a column of a time zone-sensitive data type such as TIMESTAMP as a partition key column, you must use the UNIX_TIMESTAMP partitioning function for the partition key column.

Examples

Use a partitioning function

Specify the birthday column of the DATETIME type as the single-column partition key, use the TO_DAYS partitioning function to convert the values of the birthday column to the number of days, and then perform HASH partitioning. Set the number of partitions to 8.

CREATE TABLE tb_h_fn(
 id bigint not null auto_increment, 
 bid int, 
 name varchar(30),
 birthday datetime not null,
 primary key(id)
) 
PARTITION BY HASH(TO_DAYS(birthday)) 
PARTITIONS 8;

Use a single-column partition key

Specify the id column as the single-column partition key for HASH partitioning and set the number of partitions to 8.

CREATE TABLE tb_h(
 id bigint not null auto_increment, 
 bid int, 
 name varchar(30),
 birthday datetime not null,
 primary key(id)
) 
PARTITION BY HASH(id) 
PARTITIONS 8;

Use a vector partition key

Use the vector partition key that consists of the id and bid columns for HASH partitioning and set the number of partitions to 8.

Important

The id and bid columns are used to calculate hash values. Queries that contain equality conditions can meet partition pruning conditions only if the equality conditions of the queries involve both the id and bid columns.

CREATE TABLE tb_h2(
 id bigint not null auto_increment, 
 bid int, 
 name varchar(30),
 birthday datetime not null,
 primary key(id)
) 
PARTITION BY HASH(id,bid) 
PARTITIONS 8;

Limits

Limits on data types

Integer types: BIGINT, BIGINT UNSINGEDINT, INT, INT UNSINGED, MEDIUMINT, MEDIUMINT UNSINGED, SMALLINT, SMALLINT UNSINGED, TINYINT, and TINYINT UNSINGED
Date and time types: DATETIME, DATE, and TIMESTAMP
String types: CHAR and VARCHR

Data distribution balancing

KEY partitioning and HASH partitioning are implemented based on the built-in consistent hashing algorithm MurmurHash3. This algorithm is widely tested in the industry and is proven to have low data collision and high performance. However, MurmurHash3 cannot ensure the balance of data distribution in each partition in all cases, especially when the number of different values of the partition key is relatively small.
When you use KEY partitioning or HASH partitioning, data distribution across different partitions becomes balanced if the number of different values of the partition key is greater than 3,000 based on the MurmurHash3 algorithm. Data is distributed in a more balanced way if the partition key has more different values.