This topic describes how to use the STR_HASH function.
Description
You can use the STR_HASH function to specify the start position subscript and end position subscript of a string. This way, the string value of a shard key can be truncated into a substring. Then, the function uses this substring as a string or an integer to calculate the routes of specific physical database shards and table shards. The STR_HASH function uses the following syntax:
STR_HASH( shardKey [, startIndex, endIndex [, valType [, randSeed ] ] ] )
Parameter | Description |
---|---|
shardKey | The name of the shard key column. |
startIndex | The start position subscript of the substring. The positions of the characters in the original string start from 0. The value of 0 specifies the first character in the original string. The default value of -1 specifies that the string is not truncated. |
endIndex | The end position subscript of the substring. The positions of the characters in the
original string start from 0. The value of 0 specifies the first character in the
original string. The default value of -1 specifies that the string is not truncated.
When you configure startIndex and endIndex, take note of the following items:
|
valType | The type of the substring that is used to calculate the routes of database shards
and table shards. Valid values:
|
randSeed | The value of a random seed that DRDS uses to calculate the hash value of routes when
the substring is used as a string. This value is used only when the STR_HASH function
cannot evenly distribute data by using the default random seed. The default value
is 31. You can set this parameter to other values such as 131, 13131, and 1313131.
Note
|
Note
Table shards that are created by using the STR_HASH function are suitable only for point queries. If table shards are used for range queries, a full table scan is triggered. This results in slow queries.
Limits
- A shard key must be of the CHAR or VARCHAR type. The values of the shard key must be strings.
- The parameters of the STR_HASH function cannot be modified after a table is created.
- The shard key that is of the STRING type is case-sensitive when data routing is performed.
Scenarios
- The STR_HASH function is suitable for scenarios that require precise routing where
each table shard or database shard corresponds only to one value of the shard key.
This value must be a string. For example, an Internet finance application partitions
data into database shards by year and month of the year by using the YYYYMM function,
and then partitions data into table shards by order ID. In this application, the last
three characters of each order ID are an integer that ranges from 000 to 999. This
application is required to route the records related to the last three characters
of each order ID in a physical database shard to one physical table shard. The application
uses the YYYYMM function to partition data into database shards and then uses the
STR_HASH function to partition data into table shards. To meet the requirements of
the application, each database shard must contain 1,024 table shards. The following
SQL statement can be used to create the required shards:
create table test_str_hash_tb ( id int NOT NULL AUTO_INCREMENT, order_id varchar(30) NOT NULL, create_time datetime DEFAULT NULL, primary key(id) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 dbpartition by YYYYMM(`create_time`) tbpartition by STR_HASH(`order_id`, -1, 3, 1) tbpartitions 1024;
This SQL statement finds the last three characters of each order ID, converts them into an integer that ranges from 000 to 999, and then performs the modulo operation to calculate the table shard. The total number of table shards is 1,024. The routing result ensures that each physical table shard corresponds to only one value of the shard key. By default, DRDS uses the HASH function for sharding. The default sharding function cannot achieve this effect because the hashCode function may convert strings into integers that are not unique. This way, one physical table shard may correspond to multiple different values of the shard key.
- The STR_HASH function is suitable for scenarios in which a string is used as a shard key. In most cases, this function is used when point queries are run. For example, you can use this function when you want to query transaction orders and logistics orders by ID.
Examples
- The last four characters of the string specified by order_id need to be used as an
integer to calculate the routes of the database shards and table shards. In this case,
you can execute the following SQL statement:
create table test_str_hash_tb ( id int NOT NULL AUTO_INCREMENT, order_id varchar(32) NOT NULL, create_time datetime DEFAULT NULL, primary key(id) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 dbpartition by STR_HASH(`order_id`, -1, 4, 1) tbpartition by STR_HASH(`order_id`, -1, 4, 1) tbpartitions 2;
- The characters from the third character (starIndex = 2) to the seventh character (endIndex
= 7) in the string specified by order_id need to be used as a substring to calculate
the routes of the database shards and table shards. In this case, you can execute
the following SQL statement:
create table test_str_hash_tb ( id int NOT NULL AUTO_INCREMENT, order_id varchar(32) NOT NULL, create_time datetime DEFAULT NULL, primary key(id) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 dbpartition by STR_HASH(`order_id`, 2, 7) tbpartition by STR_HASH(`order_id`, 2, 7) tbpartitions 2;
- The first five characters of the string specified by order_id need to be used as a
substring to calculate the routes of the database shards and table shards. In this
case, you can execute the following SQL statement to create tables:
create table test_str_hash_tb ( id int NOT NULL AUTO_INCREMENT, order_id varchar(32) NOT NULL, create_time datetime DEFAULT NULL, primary key(id) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 dbpartition by STR_HASH(`order_id`, 5, -1) tbpartition by STR_HASH(`order_id`, 5, -1) tbpartitions 2;
FAQ
What are the differences between dbpartition by STR_HASH(order_id)
and dbpartition by HASH(order_id)
?
Although STR_HASH and HASH use the value of a string to calculate the hash routes of database shards and table shards, they use different route algorithms. STR_HASH allows you to truncate a string into a substring when you create a table. STR_HASH also uses the UNI_HASH algorithm when the routes of database shards and table shards are calculated based on the hash value of a string. HASH performs a simple modulo operation to obtain the hash value of a string.