This topic describes how to use the STR_HASH function.

Description

You can use the STR_HASH function to specify the start position subscript and end position subscript of a string. This way, the string value of a shard key can be truncated into a substring. Then, the function uses this substring as a string or an integer to calculate the routes of specific physical database shards and table shards. The STR_HASH function uses the following syntax:

STR_HASH( shardKey [, startIndex, endIndex [, valType [, randSeed ] ] ] )
Parameter Description
shardKey The name of the shard key column.
startIndex The start position subscript of the substring. The positions of the characters in the original string start from 0. The value of 0 specifies the first character in the original string. The default value of -1 specifies that the string is not truncated.
endIndex The end position subscript of the substring. The positions of the characters in the original string start from 0. The value of 0 specifies the first character in the original string. The default value of -1 specifies that the string is not truncated.
When you configure startIndex and endIndex, take note of the following items:
  • For startIndex == j && endIndex = k (j>=0, k>=0 ,k>j), the [ j, k ) range of the original string is used as the substring. The following list provides examples on the range:
    • For the ABCDEFG string, the value of the [1,5) range is BCDE.
    • For the ABCDEFG string, the value of the [2,2) range is ''.
    • For the ABCDEFG string, the value of the [4,100) range is EFG.
    • For the ABCDEFG string, the value of the [100,105) range is ''.
  • For startIndex == -1 && endIndex = k (k>=0), the last k characters of the original string are used as the substring. If the original string contains fewer than k characters, the entire string is used as the substring.
  • For startIndex = k && endIndex == -1 (k>=0), the first k characters of the original string are used as the substring. If the original string contains fewer than k characters, the entire string is used as the substring.
  • For startIndex == -1 && endIndex == -1, the original string is not truncated. The substring is the same as the original string.
valType The type of the substring that is used to calculate the routes of database shards and table shards. Valid values:
  • 0: Distributed Relational Database Service (DRDS) uses the substring as a string to calculate routes. This value is the default value.
  • 1: DRDS uses the substring as an integer to calculate routes. The integer value of the substring cannot be greater than 9223372036854775807 and cannot be a floating-point number.
randSeed The value of a random seed that DRDS uses to calculate the hash value of routes when the substring is used as a string. This value is used only when the STR_HASH function cannot evenly distribute data by using the default random seed. The default value is 31. You can set this parameter to other values such as 131, 13131, and 1313131.
Note
  • This parameter can be set only if valType is set to 0.
  • After you configure this parameter, you must manually export all data and use a new shard algorithm to import the data. This way, data is redistributed.

Note

Table shards that are created by using the STR_HASH function are suitable only for point queries. If table shards are used for range queries, a full table scan is triggered. This results in slow queries.

Limits

  • A shard key must be of the CHAR or VARCHAR type. The values of the shard key must be strings.
  • The parameters of the STR_HASH function cannot be modified after a table is created.
  • The shard key that is of the STRING type is case-sensitive when data routing is performed.

Scenarios

  • The STR_HASH function is suitable for scenarios that require precise routing where each table shard or database shard corresponds only to one value of the shard key. This value must be a string. For example, an Internet finance application partitions data into database shards by year and month of the year by using the YYYYMM function, and then partitions data into table shards by order ID. In this application, the last three characters of each order ID are an integer that ranges from 000 to 999. This application is required to route the records related to the last three characters of each order ID in a physical database shard to one physical table shard. The application uses the YYYYMM function to partition data into database shards and then uses the STR_HASH function to partition data into table shards. To meet the requirements of the application, each database shard must contain 1,024 table shards. The following SQL statement can be used to create the required shards:
    create table test_str_hash_tb (    
        id int NOT NULL AUTO_INCREMENT,
        order_id varchar(30) NOT NULL, 
        create_time datetime DEFAULT NULL,
        primary key(id)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 
    dbpartition by YYYYMM(`create_time`) 
    tbpartition by STR_HASH(`order_id`, -1, 3, 1) tbpartitions 1024;

    This SQL statement finds the last three characters of each order ID, converts them into an integer that ranges from 000 to 999, and then performs the modulo operation to calculate the table shard. The total number of table shards is 1,024. The routing result ensures that each physical table shard corresponds to only one value of the shard key. By default, DRDS uses the HASH function for sharding. The default sharding function cannot achieve this effect because the hashCode function may convert strings into integers that are not unique. This way, one physical table shard may correspond to multiple different values of the shard key.

  • The STR_HASH function is suitable for scenarios in which a string is used as a shard key. In most cases, this function is used when point queries are run. For example, you can use this function when you want to query transaction orders and logistics orders by ID.

Examples

The data type of order_id is VARCHAR(32). You want to use order_id as the shard key to partition data into four database shards and eight table shards.
  • The last four characters of the string specified by order_id need to be used as an integer to calculate the routes of the database shards and table shards. In this case, you can execute the following SQL statement:
    create table test_str_hash_tb (    
        id int NOT NULL AUTO_INCREMENT,
        order_id varchar(32) NOT NULL, 
        create_time datetime DEFAULT NULL,
        primary key(id)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 
    dbpartition by STR_HASH(`order_id`, -1, 4, 1) 
    tbpartition by STR_HASH(`order_id`, -1, 4, 1) tbpartitions 2;
  • The characters from the third character (starIndex = 2) to the seventh character (endIndex = 7) in the string specified by order_id need to be used as a substring to calculate the routes of the database shards and table shards. In this case, you can execute the following SQL statement:
    create table test_str_hash_tb (    
        id int NOT NULL AUTO_INCREMENT,
        order_id varchar(32) NOT NULL, 
        create_time datetime DEFAULT NULL,
        primary key(id)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 
    dbpartition by STR_HASH(`order_id`, 2, 7) 
    tbpartition by STR_HASH(`order_id`, 2, 7) tbpartitions 2;
  • The first five characters of the string specified by order_id need to be used as a substring to calculate the routes of the database shards and table shards. In this case, you can execute the following SQL statement to create tables:
    create table test_str_hash_tb (    
        id int NOT NULL AUTO_INCREMENT,
        order_id varchar(32) NOT NULL, 
        create_time datetime DEFAULT NULL,
        primary key(id)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 
    dbpartition by STR_HASH(`order_id`, 5, -1) 
    tbpartition by STR_HASH(`order_id`, 5, -1) tbpartitions 2;

FAQ

What are the differences between dbpartition by STR_HASH(order_id) and dbpartition by HASH(order_id)?

Although STR_HASH and HASH use the value of a string to calculate the hash routes of database shards and table shards, they use different route algorithms. STR_HASH allows you to truncate a string into a substring when you create a table. STR_HASH also uses the UNI_HASH algorithm when the routes of database shards and table shards are calculated based on the hash value of a string. HASH performs a simple modulo operation to obtain the hash value of a string.