CO_HASH partitioning - PolarDB - Alibaba Cloud Documentation Center

Supported versions

The version of PolarDB-X must be 5.4.18-17047709 or later.

Applicable scenarios

In e-commerce scenarios, the values of two or more columns in a business order table may have similarity. For example, the last N digits of the order_id column are the same as those of the buyer_id column. Due to the large amount of data in the business order table, you can horizontally split the order table based on the last N digits of the order_id and buyer_id columns. This way, applications can route data to the same physical partition in order placement scenarios regardless of whether they read or write data based on the order ID or buyer ID. This prevents a large number of cross-database transactions.

PolarDB-X supports the CO_HASH partitioning policy to resolve the issues in the following scenarios:

The values of two or more columns in a table to be partitioned have similarity. For example, the last or first N digits of a column are the same as those of another column in a table.
Data can be separately routed based on different partition key columns at the same time in a partitioned table. These partition key columns are independent of each other. Queries in which equality conditions involve one of the partition key columns can meet partition pruning conditions.

Syntax

In PolarDB-X, you can use the CO_HASH partitioning policy to route data based on different values of multiple partition key columns at the same time.

CREATE TABLE ... 
PARTITION BY CO_HASH(partition_expr_list) 
PARTITIONS number;

partition_expr_list:
  partition_expr, partition_expr [, partition_expr, ...]
  
partition_expr:
    partition_column	
  | partition_func(partition_column)


# Define partitioning functions.
partition_func:
     
  | LEFT
  | SUBSTR
  | SUBSTRING

Note

For more information about the differences between CO_HASH partitioning and HASH or KEY partitioning, see the Comparison between COHASH partitioning, HASH partitioning, and KEY partitioning table of the "Overview" topic.

Usage notes

You cannot use nested partitioning functions for partition key columns, such as SUBSTR(SUBSTR(c1,-6),4).
The data types of all partition key columns must be identical in the following terms:
- Character sets and collation.
- Length and precision definition.
By default, a partitioned table can contain up to 8,192 partitions.
By default, a partition key can consist of up to five partition key columns.
CO_HASH partitioning supports only the following partitioning functions:
- RIGHT
- LEFT
- SUBSTR

Examples

For example, the last six digits of the order_id column are the same as those of the buyer_id column in each row of a table named orders.

If you want to partition the orders table based on the last six digits of the order_id and buyer_id columns, and route the values of the order_id and buyer_id columns in a row to the same partition, you can use the following syntax:

CREATE TABLE t_orders(
 id bigint not null auto_increment, 
 seller_id bigint,
 order_id bigint, 
 buyer_id bigint,
 order_time datetime not null,
 primary key(id)
) 
PARTITION BY CO_HASH(
  	RIGHT(`order_id`,6) /* Obtain the last six digits of Column c1. */,
  	RIGHT(`buyer_id`,6) /* Obtain the last six digits of Column c2. */
)  
PARTITIONS 8;

For more information about how to use the CO_HASH partitioning policy and other partitioning functions, see Partitioning functions.

Limits

Limits on data types

Integer types: BIGINT, BIGINT UNSINGEDINT, INT, INT UNSINGED, MEDIUMINT, MEDIUMINT UNSINGED, SMALLINT, SMALLINT UNSINGED, TINYINT, and TINYINT UNSINGED
Date and time types: DATETIME, DATE, and TIMESTAMP
String types: CHAR and VARCHR
Fixed-point type: DECIMAL, for which the number of digits in the fractional part must be 0

Limits on partition key columns

You must make sure the similarity between the values of different partition key columns. PolarDB-X only checks the routing results of partitioned tables. The values of different partition key columns in a CO_HASH partitioned table have similarity. When you write a row of data to the CO_HASH partitioned table, the values of different partition key columns in the row must be routed to the same partition. However, PolarDB-X does not ensure the similarity between the values of different partition key columns. Therefore, you must make sure the similarity between the values of different partition key columns. PolarDB-X only checks the routing results of partitioned tables and does not verify the similarity of data. For example, the last four digits of the c1 column are the same as those of the c2 column in your table. In PolarDB-X, a row whose value of Column c1 is 1001234 and value of Column c2 is 1320 can be routed to Partition 0. In this case, the last four digits of the c1 and c2 columns are different.
PolarDB-X imposes limits on using DML statements to modify the values of partition key columns. Due to the similarity between the values of multiple partition key columns in CO_HASH partitioned tables, PolarDB-X imposes limits on using DML statements to modify the values of partition key columns. This prevents errors in the routing results of CO_HASH partitioned tables.
- When you execute the INSERT or REPLACE statement, if the values of different partition key columns in the same row in the VALUES clause are to be routed to different partitions based on the results generated by the routing algorithm, the INSERT or REPLACE statement is prohibited and an error is reported.
- When you execute the UPDATE or UPSERT statement, if the SET clause modifies the value of a partition key column, the values of all partition key columns must be modified at the same time. For example, if the c1 and c2 columns are specified as partition key columns, you must execute the following statement: UPDATE t1 SET c1='xx',c2='yy' WHERE id=1. After the SET clause modifies the values of partition key columns, if the values of different partition key columns in the same row are to be routed to different partitions, the UPDATE or UPSERT statement is prohibited and an error is reported.
- After you partition a global secondary index (GSI) by using the CO_HASH partitioning policy, if a DML operation performed on the primary table, such as INSERT or UPDATE, causes that the values of different partition key columns in the same row in the GSI of the primary table are routed to different partitions, the DML operation is prohibited and an error is reported.
PolarDB-X automatically processes the values whose prefix is 0 for partition key columns of integer types in a CO_HASH partitioned table. Due to the similarity between the values of partition key columns in a CO_HASH partitioned table, you need to use partitioning functions such as SUBSTR, LEFT, and RIGHT to define partition key columns. After values of some integer types are truncated, 0 may be added as the prefix of these values. For example, the last four digits of the c1 column are the same as those of the c2 column in your table. You want to insert a row whose value of Column c1 is 1000034 and value of Column c2 is 34. In this case, the last four digits of Column c1 are 0034. In a CO_HASH partitioned table, after all original values of a partition key column of an integer type are truncated, the new values are automatically converted into the integer type of the partition key column and then are routed. Therefore, the string 0034 is converted into the string 34. Then, the hash value of the string 34 is calculated and routed to a partition. This way, the prefix of the string 0034 is automatically processed.