PolarDB-X Practice Series – Part 3 (2): Transparent Distributed Best Practices

Common distributed databases require users to set the database shard key. Users need to manually manage database shard rules. This brings a threshold for users to use distributed databases. Users need to have a clear understanding of the data distribution of the database and the structure of each table to make good use of distributed databases. PolarDB-X proposed the concept of transparent distribution. The details of distribution are transparent to users. Users only need to use distributed databases in the same way as stand-alone databases. This significantly reduces the threshold for users to use distributed databases.

Databases in AUTO mode are supported in PolarDB-X5.4.13 and later versions and are also known as databases in automatic partitioning mode. Databases in AUTO mode support the automatic partitioning feature. When you create a table, you do not need to specify a partition key. Data in the table is automatically and evenly distributed among partitions. You can also use standard MySQL statements to partition the table. Databases in AUTO mode support features provided for distributed databases (such as transparent distribution, auto scaling, and partition management).

Databases earlier than PolarDB-X5.4.13 are called databases in DRDS mode. Databases in DRDS mode do not support automatic partitioning. When you create a table, you can use the sharding syntax dedicated to PolarDB-X to specify a database shard key and a table shard key. If no database shard key and table shard key are specified, a non-partitioned table is created.

PolarDB-X V5.4.13 and later versions support databases in AUTO mode and DRDS mode. A PolarDB-X instance can contain databases in AUTO mode and DRDS mode.

Note:

In order to create a database in AUTO mode, the MODE='AUTO' must be specified in the CREATE DATABASE syntax.
If you do not specify the MODE parameter in the CREATE DATABASE syntax, a database in DRDS mode is created by default.
If your database is in AUTO mode, you cannot use the sharding syntax of DRDS databases to shard tables when you create tables. You can only create partitioned tables in databases that are in AUTO mode.
If your database is in DRDS mode, you cannot use the table partitioning syntax to shard tables when you create tables. You can only create sharded tables in databases that are in DRDS mode.

Use the MODE Parameter to Specify the Database Mode

PolarDB-X introduces the MODE parameter when creating a database to determine whether the created database is in AUTO mode or DRDS mode. The following table describes how to specify the MODE parameter.

Note: After the database is created, the mode of the database cannot be changed.

Automatic Partitioning and Manual Partitioning

Automatic Partitioning

You do not need to specify partitioning-related configuration items to configure automatic partitioning when you create a table, including a partition key or a partitioning policy. PolarDB-X can automatically select a partition key and perform horizontal partitioning on the table and its indexes. Only tables of databases in AUTO mode can be automatically partitioned. The automatic partitioning feature is unavailable for databases in DRDS mode.

Example:

You can execute the following statement in standard MySQL syntax without specifying partitioning-related configuration items to create a table named tb:

CREATE TABLE tb(a INT, b INT, PRIMARY KEY(a));

If you execute the preceding DDL statement on a database in DRDS mode, a non-partitioned table is created. The following figure shows an example:

Execute the SHOW statement to view the complete CREATE TABLE statement:

SHOW FULL CREATE TABLE tb \G
*************************** 1. row ***************************
       Table: tb
Create Table: CREATE TABLE `tb` (
    `a` int(11) NOT NULL,
    `b` int(11) DEFAULT NULL,
    PRIMARY KEY (`a`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4
1 row in set (0.02 sec)

If you execute the preceding DDL statement on a database in AUTO mode, a partitioned table is created. The table is automatically partitioned based on the primary key. The following figure shows an example:

Execute the SHOW statement to view the complete CREATE TABLE statement:

SHOW FULL CREATE TABLE tb \G
*************************** 1. row ***************************
       TABLE: tb
CREATE TABLE: CREATE PARTITION TABLE `tb` (
    `a` int(11) NOT NULL,
    `b` int(11) DEFAULT NULL,
    PRIMARY KEY (`a`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4
PARTITION BY KEY(`a`)
PARTITIONS 16
1 row in set (0.01 sec)

Therefore, in a database in AUTO mode, you only need to use the standard MySQL table creation syntax (including the index creation syntax) to create tables. The automatic partitioning feature of PolarDB-X allows applications to easily enjoy the benefits of distributed databases (such as Auto Scaling and partition management).

Manual Partitioning

When you use the manual partitioning method to create a table, you need to specify partitioning-related configuration items (such as the partition key and partitioning policy). The syntax used to create a table in a database in AUTO mode is different from the syntax used to create a table in a database in DRDS mode.

For databases in AUTO mode, you can use the standard MySQL syntax to create partitioned tables. Available partitioning policies include hash partitioning, range partitioning, and list partitioning.

In the following example, the PARTITION BY HASH(a) syntax is used, and the partition key a and the hash partitioning policy is used.

CREATE TABLE tb (a INT, b INT, PRIMARY KEY(a))
    -> PARTITION by HASH(a) PARTITIONS 4;
Query OK, 0 rows affected (0.83 sec)
SHOW FULL CREATE TABLE tb\G
*************************** 1. row ***************************
       TABLE: tb
CREATE TABLE: CREATE TABLE `tb` (
    `a` int(11) NOT NULL,
    `b` int(11) DEFAULT NULL,
    PRIMARY KEY (`a`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4
PARTITION BY KEY(`a`)
PARTITIONS 4
1 row in set (0.02 sec)

DRDS Mode: You can use the sharding syntax for databases in DRDS mode when you create tables. Only the hash policy is supported.

In the following example, the DBPARTITION BY HASH(a) TBPARTITION BY HASH(a) syntax is used and the column a is used as the sharding key.

CREATE TABLE tb (a INT, b INT, PRIMARY KEY(a))
    -> DBPARTITION by HASH(a)
    -> TBPARTITION by HASH(a)
    -> TBPARTITIONS 4;
Query OK, 0 rows affected (1.16 sec)
SHOW FULL CREATE TABLE tb\G
*************************** 1. row ***************************
       Table: tb
Create Table: CREATE TABLE `tb` (
    `a` int(11) NOT NULL,
    `b` int(11) DEFAULT NULL,
    PRIMARY KEY (`a`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4  dbpartition by hash(`a`) tbpartition by hash(`a`) tbpartitions 4
1 row in set (0.02 sec)

A Comparison between Routing Algorithms Used for Partitioned Tables and Sharded Tables

The routing algorithm used for partitioned tables is different from the routing algorithm used for sharded tables. The following figure shows the difference between the routing algorithms.

In the routing algorithm used for sharded tables, the hash values and the number of physical table shards are used to perform modulo operations. If you want to change the number of shards, the hash values of all data need to be recalculated. For example, if you want to change the number of shards from four to five, you need to recalculate hash values for all data. The routing algorithm used for table sharding in DRDS mode is not suitable for scenarios in which you want to change the number of shards.
The default routing algorithm used for table partitioning is a range-based consistent hashing algorithm. This algorithm allows you to perform operations on partitions (such as splitting and merging) to change the number of partitions. You do not need to rehash all data when you use this algorithm. In AUTO mode, you can change the number of partitions when tables are partitioned.

Core Features and Common Usage Scenarios of AUTO Mode

Split Hot Data to Handle Hot Data

PolarDB-X supports two processing methods for hot data:

Method 1: Migrate the partition of hot data to a specific data node to use dedicated storage resources to store hot data. This way, hot data does not affect the services that use non-hot data. Perform the following steps to migrate hot data to a specific data node:

Execute the following statement to extract hot data and store the hot data in a specified partition:

ALTER TABLEGROUP #tgName EXTRACT to PARTITION #hotPartitionName BY HOT VALUE(#keyVal)

Execute the following statement to migrate the partition of hot data to the specified data node:

ALTER TABLEGROUP #tgName MOVE PARTITIONS #hotPartitionName TO #dn

If the hot data breaks the single-point performance of the machine, you can use the second processing method in PolarDB-X to hash the hot data using the following command to better support the linear expansion of the business:

ALTER TABLEGROUP #tgName SPLIT INTO PARTITIONS #hotPartitionNamePrefix #N BY HOT VALUE(#keyVal);

The preceding statement can be executed to split hot data into multiple partitions based on the value of the keyVal parameter, add the specified prefix to the partition names, and evenly distribute the partitions among different data nodes. This way, the hot data is linearly distributed among different data nodes. This helps you resolve issues related to hot data.

Migrate Partitions to Balance Data among Data Nodes

In DRDS mode, a modulo operation is performed based on hash values to implement sharding in databases. A table shard is associated with a specific database shard. For example, Database shard N includes Table shard M. You cannot modify the relationships between database shards and table shards. If you want to split, merge, or migrate a table shard, you need to recalculate hash values for all data in the table.

You can only migrate database shards to a data node because table shards cannot be migrated to other data nodes. If you migrate a database shard that stores a large volume of data, data among data nodes cannot be balanced. The following figure shows the issue:

In AUTO mode, the consistent hashing routing algorithm is used for table partitioning. This way, you can merge, split, and migrate partitions in a flexible manner. For example, you can migrate your partitions to specified data nodes based on your business requirements. This operation does not affect irrelevant data in partitions. PolarDB-X can evenly distribute data to each data node by combining operations (such as partition merging and partition splitting with partition scheduling), as shown in the following figure. This balances data among data nodes.

Use the TTL Feature to Automatically Delete Historical Data

In specific business scenarios, the volume of business data may increase in a short period, and the frequency at which business data is queried may decrease over a period. As such, if data is always stored in PolarDB-X, it occupies storage space and reduces the efficiency of normal business queries. In this scenario, many businesses choose to archive historical data and delete it in PolarDB-X. This allows customers to quickly delete historical data without affecting existing businesses.

If PolarDB-X can delete historical data through DDL, the speed of data cleansing is significantly improved. PolarDB-X developed the TTL feature in AUTO mode to quickly delete historical data.

If you include statements that are used to configure the TTL feature in the CREATE TABLE statement, a TTL table is created after the CREATE TABLE statement is executed. The following sample statement can be executed to create a table named t_order. The t_order table is partitioned based on the values in the gmt_modified column. One partition is added each month. Each partition expires 12 months after the partition is created. Three partitions are created when the table is created.

CREATE TABLE t_order (
  id bigint NOT NULL AUTO_INCREMENT,
  gmt_modified DATETIME NOT NULL,
  PRIMARY KEY (id, gmt_modified)
)
PARTITION BY HASH(id)
PARTITIONS 16
-- The following sample code provides an example of the TTL syntax 
LOCAL PARTITION BY RANGE (gmt_modified)  -- The table is partitioned based on the values in the gmt_modified column. 
INTERVAL 1 MONTH -- One partition is added each month. 
EXPIRE AFTER 12 -- Each partition expires 12 months after the partition is created. 
PRE ALLOCATE 3;  -- Three partitions are created when the table is created.

Use the LOCALITY Attribute to Specify the Location Where Data Is Stored

The LOCALITY attribute is supported for databases in AUTO mode to allow you to specify locations based on multiple dimensions. You can use this attribute to specify the location where you want to store data.

You can specify the LOCALITY attribute for a database. For example, you can execute the following statement to specify the location of the db1 database:

CREATE DATABASE db1 MODE='AUTO' LOCALITY='dn=pxc-xdb-s-pxcexample'

You can specify the LOCALITY attribute for a table. For example, you can execute the following statement to specify the location of the tb table. By default, a non-partitioned table is stored in one data node. You can set the LOCALITY attribute to the physical location where you want to deploy the table.

CREATE TABLE tb (a INT, b INT, PRIMARY KEY(a)) LOCALITY='dn=pxc-xdb-s-pxcexample'

The LOCALITY attribute is supported for partitions in PolarDB-X V5.4.14 and later versions. If your business application is deployed across multiple regions, you can use a database in AUTO mode to store your business data and use the list partitioning method to partition the tables in which your business data is stored. You can also specify the system to store data generated in a region in the same data node.

CREATE TABLE orders_region(
 id int,
 country varchar(64),
 city varchar(64),
 order_time datetime not null)
PARTITION BY LIST COLUMNS(country,city)
(
  PARTITION p1 VALUES IN (('China','Hangzhou'), ('China','Beijing')) LOCALITY='dn=pxc-xdb-s-pxcexample1',
  PARTITION p2 VALUES IN (('United States','NewYork'),('United States','Chicago')) LOCALITY='dn=pxc-xdb-s-pxcexample2',
  PARTITION p3 VALUES IN (('Russian','Moscow')) LOCALITY='dn=pxc-xdb-s-pxcexample3'
);

A Comparison between the Features Provided for Databases in AUTO Mode and DRDS Mode

Compared with databases in DRDS mode, databases in AUTO mode support new features (such as automatic partitioning, hot data splitting, partition scheduling, and time to live (TTL) tables). The distribution feature provided for databases in AUTO mode is also optimized based on various dimensions (such as partition management and partition modification).

The following table describes the differences between the features provided for databases in AUTO mode and DRDS mode:

A Comparison between the Performance of Databases in DRDS Mode and Databases in AUTO Mode

The DRDS mode table sharding uses different routing algorithms from the AUTO mode table partitioning. In order to evaluate the performance difference between the two routing algorithms, Sysbench is used to benchmark PolarDB-X and observe their throughput (in QPS) in different Sysbench test scenarios.

Test Environment

PolarDB-X Instance Types: polarx.x4.2xlarge.2e
- CN (16C64G) × 2
- DN (16C64G) × 2
Version: 5.4.13-16415631
Table partitioning settings and table sharding settings:
- Table partitioning:
  - 32 partitions
  - Partitioning clause: PARTITION BY HASH(id) PARTITIONS 32
  - Total volume of data: 160 million rows
- Table sharding:
  - 32 physical table shards
  - Sharding clause: DBPARTITION BY HASH(id) TBPARTITION BY HASH(id) TBPARTITION 2
  - Total volume of data: 160 million rows

Test Scenario

The following list describes the scenarios in which Sysbench tests are performed:

oltp_point_select: Statements that only include equality conditions are executed to perform point read operations. The partition key of the table is used in the WHERE clause.
oltp_read_only: Statements that include the BETWEEN operator or an equality condition are executed to perform small-range and point read operations in a transaction. The partition key of the table is used in the WHERE clause.
oltp_read_write: Point read operations, point write operations, small-range read operations, and small-range write operations are performed based on the partition keys of the tables to process transactions.

Test Results

The following analysis results are obtained based on the preceding test results:

The consistent hashing routing algorithm used in table partitioning is more complex than the modular hash routing algorithm used in sharding. In scenarios in which oltp_point_select queries are performed, the QPS for table partitioning is similar to the QPS for sharding.
The expressions in the statements that are used to perform oltp_read_only queries and oltp_read_write queries are more complex than the expressions in the statements that are used to perform oltp_point_select queries because small-range queries are required in oltp_read_only queries and oltp_read_write queries. In this case, the overall QPS on the partitioned tables is approximately 33% higher than the overall QPS on the sharded tables because the partitioned tables are pruned.

Community

PolarDB-X Practice Series – Part 3 (2): Transparent Distributed Best Practices

Use the MODE Parameter to Specify the Database Mode

Automatic Partitioning and Manual Partitioning

Automatic Partitioning

Manual Partitioning

A Comparison between Routing Algorithms Used for Partitioned Tables and Sharded Tables

Core Features and Common Usage Scenarios of AUTO Mode

Split Hot Data to Handle Hot Data

Migrate Partitions to Balance Data among Data Nodes

Use the TTL Feature to Automatically Delete Historical Data

Use the LOCALITY Attribute to Specify the Location Where Data Is Stored

A Comparison between the Features Provided for Databases in AUTO Mode and DRDS Mode

A Comparison between the Performance of Databases in DRDS Mode and Databases in AUTO Mode

Test Environment

Test Scenario

Test Results

Read previous post:

Read next post:

ApsaraDB

You may also like

Comments

ApsaraDB

Related Products

PolarDB for PostgreSQL

PolarDB for Xscale

PolarDB for MySQL

ApsaraDB for OceanBase