STR_HASH - PolarDB - Alibaba Cloud Documentation Center

Note

Table shards that are created with the STR_HASH function are applicable only to point query scenarios. Range queries for a service trigger a full table scan, which causes a slow response.

Limits

The partitioning key must be a string (CHAR or VARCHAR).
The parameters of the STR_HASH function cannot be modified after a table is created.
The version of the PolarDB-X 1.0 instance must be 5.3.5 or later.

Scenarios

The STR_HASH function is applicable to scenarios that require precise routing where only one table or database shard corresponds to the value of one partitioning key. This value must be a string.
For example, an Internet finance application partitions data into database shards by year and month of the year by using the YYYYMM function, and then partitions data into table shards by order ID. In this application, the last three characters of each order ID are an integer that ranges from 000 to 999. This application is required to route the last three characters of each order ID in a physical database shard to one physical table shard. Therefore, the application uses the YYYYMM function to partition data into database shards and then uses the STR_HASH function to partition data into table shards. To meet the requirements of the application, each database shard contains 1,024 table shards. The following SQL statement is used to create the required shards:
```
create table test_str_hash_tb (    
    id int NOT NULL AUTO_INCREMENT,
    order_id varchar(30) NOT NULL, 
    create_time datetime DEFAULT NULL,
    primary key(id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 
dbpartition by YYYYMM(`create_time`) 
tbpartition by STR_HASH(`order_id`, -1, 3, 1) tbpartitions 1024;
```
This SQL statement finds the last three characters of each order ID, converts them into an integer (000 to 999), and then performs the modulo operation to calculate the appropriate table shard. The total number of table shards is 1,024. The routing result ensures that each physical table shard corresponds to the value of only one partitioning key. The default partitioning function of PolarDB-X 1.0 cannot achieve this effect because the hashCode function may convert strings into integers that are not unique. One physical table shard may correspond to the values of multiple partitioning keys.
Typical scenarios of point query
The STR_HASH function is applicable to scenarios where a string is used as the partitioning key. Point query is used in most scenarios, such as querying transaction orders and logistics orders by ID.

Syntax

The STR_HASH function allows you to truncate the string value of a partitioning key into a substring by specifying the start position subscript and end position subscript. Then, the function uses this substring as a string or an integer input to calculate the routes of specific physical database shards and table shards. For more information, see the following syntax:

STR_HASH( shardKey [, startIndex, endIndex [, valType [, randSeed ] ] ] )

Table 1. Parameters
Parameter	Description
shardKey	The name of the partitioning key column.
startIndex	The start position subscript of the substring. The positions of the characters in the original string start at 0. This means that the value 0 indicates the first character in the original string. To disable truncation, retain the default value of -1.
endIndex	The end position subscript of the target substring. The positions of the characters in the original string start at 0. This means that the value 0 indicates the first character in the original string. To disable truncation, retain the default value of -1. Note Note the following values of startIndex and endIndex: For `startIndex == j && endIndex = k (j>=0, k>=0 ,k>j)`, the `[ j, k )` range of the original string is used as the substring. For example: For the ABCDEFG string, the value of the `[1,5)` range is `BCDE`. For the ABCDEFG string, the value of the `[2,2)` range is `''`. For the ABCDEFG string, the value of the `[4,100)` range is `EFG`. For the ABCDEFG string, the value of the `[100,105)` range is `''`. For `startIndex == -1 && endIndex = k (k>=0)`, the last K characters of the original string are used as the substring. If the original string contains fewer than K characters, the entire string is used as the substring. For `startIndex = k && endIndex == -1 (k>=0)`, the first K characters of the original string are used as the substring. If the original string contains fewer than K characters, the entire string is used as the substring. For `startIndex == -1 && endIndex == -1`, no truncation is performed. The substring is the same as the original string.
valType	The type of the substring that is used for calculating routes of database shards and table shards. Valid values: 0 (default): PolarDB-X 1.0 uses the substring as a string to calculate routes. 1: PolarDB-X 1.0 uses the substring as an integer to calculate routes. The integer value of the substring cannot be greater than 9223372036854775807 and cannot be a floating-point number.
randSeed	The value of a random seed that PolarDB-X 1.0 uses to calculate the hash value of routes when the substring is used as a string. This value is used only when the STR_HASH function cannot achieve even data distribution by using the default random seed. The default value is 31. You can set this parameter to other values such as 131, 13131, and 1313131. Note This parameter can be set only when valType is set to 0. After you set this parameter, you must redistribute data by manually exporting all data and then importing it with the new partitioning algorithm.

Examples

Assume that the data type of order_id is VARCHAR(32). You want to use order_id as the partitioning key to partition data into four database shards and eight table shards.

Assume that you want to use the last four characters of the order_id string as an integer to calculate the routes of the database shards and table shards. You can use the following SQL statement to create tables:

create table test_str_hash_tb (    
    id int NOT NULL AUTO_INCREMENT,
    order_id varchar(32) NOT NULL, 
    create_time datetime DEFAULT NULL,
    primary key(id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 
dbpartition by STR_HASH(`order_id`, -1, 4, 1) 
tbpartition by STR_HASH(`order_id`, -1, 4, 1) tbpartitions 2;

Assume that you want to use the characters from the third character (starIndex = 2) to the seventh character (endIndex = 7) of the order_id string as a substring to calculate the routes of the database and table shards. You can use the following SQL statement to create tables:
```
create table test_str_hash_tb (    
    id int NOT NULL AUTO_INCREMENT,
    order_id varchar(32) NOT NULL, 
    create_time datetime DEFAULT NULL,
    primary key(id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 
dbpartition by STR_HASH(`order_id`, 2, 7) 
tbpartition by STR_HASH(`order_id`, 2, 7) tbpartitions 2;
```

Assume that you want to use the first five characters of the order_id string as a substring to calculate the routes of the database shards and table shards. You can use the following SQL statement to create tables:

create table test_str_hash_tb (    
    id int NOT NULL AUTO_INCREMENT,
    order_id varchar(32) NOT NULL, 
    create_time datetime DEFAULT NULL,
    primary key(id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 
dbpartition by STR_HASH(`order_id`, 5, -1) 
tbpartition by STR_HASH(`order_id`, 5, -1) tbpartitions 2;

FAQ

Q: What are the differences between dbpartition by STR_HASH(order\_id) and dbpartition by HASH(order\_id)?

A: STR_HASH and HASH both use the value of a string to calculate the hash routes of database shards and table shards. However, they use different route algorithms. STR_HASH allows you to truncate the original string into a substring, which it uses to create tables. STR_HASH also uses the UNI_HASH algorithm when the routes of database shards and table shards are calculated based on the hash value of a string. HASH performs a simple modulo operation to obtain the hash value of a string.