This topic describes how to determine appropriate shard keys in PolarDB-X 1.0.

Background information

A shard key is a field for database sharding or table sharding. The shard key is used to create horizontal sharding rules. PolarDB-X 1.0 uses sharding functions to generate computing results based on shard key values. Then, data is distributed to ApsaraDB RDS for MySQL instances based on the computing results.

When you perform table sharding, you must comply with the following primary principle: Determine the appropriate entities to which data belongs based on your business logic. You must make sure that most, core, or concurrent SQL operations are performed based on the entities. After you determine the entities, you can use the fields that identify the entities as your shard keys.

Examples

In most cases, business logic entities vary based on application scenarios. In the typical application scenarios in this topic, business logic entities can be determined in an easy manner and the fields that identify the entities can be used as the shard keys. Similarly, appropriate entities can also be determined in other application scenarios. The following examples are used to explain business logic entities and shard keys.

Note In most cases, we recommend that you do not use a column that allows null values as a shard key.
  • User-oriented Internet applications are designed to meet user requirements. Users are the business logic entities and the user ID field can be used as the shard key.
  • Seller-oriented e-commerce applications are designed to meet seller requirements. Sellers are the business logic entities and the seller ID field can be used as the shard key.
  • Gaming applications are designed to meet gamer requirements. Gamers are the business logic entities and the gamer ID field can be used as the shard key.
  • Online Internet of Vehicles (IoV) applications are designed based on vehicles. Vehicles are the business logic entities and the vehicle ID field can be used as the shard key.
  • Online tax applications are designed based on taxpayers. Taxpayers are the business logic entities and the taxpayer ID field can be used as the shard key.

For example, in a seller-oriented e-commerce application, you may need to perform horizontal sharding on the table that is created by executing the following statement:

CREATE TABLE sample_order (
  id INT(11) NOT NULL,
  sellerId INT(11) NOT NULL,
  trade_id INT(11) NOT NULL,
  buyer_id INT(11) NOT NULL,
  buyer_nick VARCHAR(64) DEFAULT NULL,
  PRIMARY KEY (id)
)

In this scenario, sellers are the business logic entities and you can use the sellerId field as the shard key. You can execute the following data definition language (DDL) statement to create a shared table:

CREATE TABLE sample_order (
  id INT(11) NOT NULL,
  sellerId INT(11) NOT NULL,
  trade_id INT(11) NOT NULL,
  buyer_id INT(11) NOT NULL,
  buyer_nick VARCHAR(64) DEFAULT NULL,
  PRIMARY KEY (id)
) DBPARTITION BY HASH(sellerId)

If appropriate business logic entities cannot be identified for shard keys in an easy manner, especially in traditional enterprise-level applications, use the following methods to determine shard keys:

  • Determine your shard key based on data distribution and data access requests. Make sure that your data is evenly distributed across shards if possible. PolarDB-X 1.0 supports global secondary indexes to ensure strong consistency. Global secondary indexes and parallel queries allow you to improve SQL concurrency and reduce the response time in this scenario.
  • Use the fields of the string data type and the fields of the date and time data type as your shard key for database sharding or table sharding. Note that the fields of the string data type store numeric values. This method is applicable to log retrieval.

For example, in a log system that records user operations, you may need to perform horizontal sharding on the log table that is created by executing the following statement:

CREATE TABLE user_log (
  userId INT(11) NOT NULL,
  name VARCHAR(64) NOT NULL,
  operation VARCHAR(128) DEFAULT NULL,
  actionDate DATE DEFAULT NULL
)

In this scenario, you can use the user ID and the time fields as the shard key to divide the table data by week. You can execute the following DDL statement to create a sharded table:

CREATE TABLE user_log (
  userId INT(11) NOT NULL,
  name VARCHAR(64) NOT NULL,
  operation VARCHAR(128) DEFAULT NULL,
  actionDate DATE DEFAULT NULL
) DBPARTITION BY HASH(userId) TBPARTITION BY WEEK(actionDate) TBPARTITIONS 7

For more information about how to determine shard keys and how to perform table sharding, see CREATE TABLE and Overview.