This topic describes how to perform hot row optimization to improve the performance of databases.
One of the following PolarDB clusters is used:
- A cluster of PolarDB for MySQL 5.6 whose kernel version is 20200601 or later.
- A cluster of PolarDB for MySQL 8.0 whose kernel version is 18.104.22.168.10 or later.
The hot row optimization feature is not supported in the following scenarios:
- Hot rows are in a partitioned table.
- A trigger is defined on the table to which the hot rows belong.
Issues and solutions
Hot rows refer to the rows in a database that are frequently added, deleted, modified, or queried. When a transaction updates data in a row, the row is locked until the transaction is committed or rolled back. Each row can be updated by only one transaction at a time. Other transactions are pending. Requests to update a hot row are processed in serial order. Therefore, traditional database sharding and table sharding policies are ineffective to improve database performance.
On e-commerce platforms, limited-time sales and flash sales are frequently used for promotion. In these scenarios, a large number of requests to update a hot row are sent to the database within a short period of time. This results in row lock contention and queues for acquiring row locks. This degrades the system performance. If an update request waits for a long time before the update request is processed, services are adversely affected.
To fix the preceding issue and reduce the latency, hardware improvement is not enough. In this case, PolarDB optimizes the kernel of the database. The optimized kernel automatically identifies the requests to update hot rows and groups these requests to update the same hot row within a specific period of time. Different groups of the requests are parallelly processed in pipelines. This improves system performance.
How it works
- Convert serial processing to parallel processing
Parallel processing is the easiest way to improve the performance of a database system. However, it is difficult for the system to process all the requests to update the same hot row in parallel. PolarDB introduces pipelines to process these update requests in parallel to maximize the parallelization of hot rows processing.
The SQL statements that are executed to update hot rows are attached with the
commit_on_successlabel. The optimized MySQL kernel can automatically identify update requests that are attached with the autocommit or commit_on_success label. The kernel can also implement hashing on the collected update requests based on primary keys or unique keys. Then, the update requests that are allocated to the same bucket are processed and committed in groups and batches based on the arrival time.
To process these update requests in pipelines, two execution units are required to group these requests. When the update requests in the first group are collected and are ready to commit, the update requests in the second group start to be collected. When the update requests in the second group are collected and are ready to commit, the update requests in the first group have been committed. The next batch of the update requests starts to be collected in the first group. This ensures that the update requests are processed in parallel.
Multi-core CPUs are widely used. This pipelined processing method can make full use of hardware resources, optimize CPU utilization, and improve the parallel processing capability of database systems. This maximizes the throughput of the database systems.
- Eliminate the need to wait for acquiring row locks
To ensure the logical consistency of the data, a row must be locked when it is updated. If the requests of acquiring row locks are not immediately processed and wait in a queue, the request processing latency is increased and deadlock detection is triggered. This consumes additional resources.
The requests to update the same row are grouped in chronological order. The first update request in a group serves as a leader. The leader reads data in the row and locks the row. The other update requests in the same group serve as followers. If the row is locked by the leader, the followers can immediately acquire the row locks.
This reduces the number of times and time consumed for row locks, and improves the system performance.
- Reduce the traversals of B-tree indexes
MySQL manages data based on B-tree indexes. During each query, MySQL traverses all indexes to query the required row. It requires a long time to traverse a table if the table is large and has multiple index levels.
If the grouping mechanism is used, only the leader of each group needs to traverse indexes to query the required row. Then, the leader caches the updated row to the memory. The followers in the same group can read the row from the memory after they acquire the row locks, and do not need to traverse the indexes again.
This reduces the number of index traversals and the time consumed.
Use the hot row optimization feature
- Before you use the hot row optimization feature to improve the performance of a database,
you must specify the following parameters. For more information about how to set the
parameters, see Specify cluster parameters.
Parameter Description hotspot Specifies whether to enable the hot row optimization feature. Valid values: ON and OFF. Default value: OFF. hotspot_for_autocommit Specifies whether to enable the hot row optimization feature to optimize the UPDATE statement that is in
auto-commitmode. Valid values: ON and OFF. Default value: OFF.Note You can specify this parameter only when the hotspot parameter is set to ON.
hotspot_update_max_wait_time The time that is required for the leader in a group to wait for the followers to join the group in the process of request grouping. Unit: microseconds. Default value: 100 μs. hotspot_lock_type Specifies whether to use a new type of row lock when the system process requests to update the same data row in groups. Valid values: ON and OFF. Default value: OFF.Note If this parameter is set to ON, the requests to update the same hot row do not need to wait to acquire row locks. This improves the performance of the database system.Note
- The hotspot_for_autocommit, hotspot_update_max_wait_time, and hotspot_lock_type parameters are not available. To use these parameters,Submit a ticket.
- To use the hot row optimization feature, you must enable the binary logging feature.
Therefore, if you try to set the hotspot parameter to ON when the binary logging feature is disabled, an error is returned.
If you execute the
MySQL [(none)]> set global hotspot=ON;statement to set the hotspot parameter to ON when the binary logging feature is disabled, the following error is returned:
ERROR 3042 (HY000): Variable 'hotspot' cannot be enabled because 'bin logging' is enabled/disabled
For more information about how to enable binary logging, see Enable binary logging.
- When the binary logging feature is enabled globally but disabled for sessions, you can set the hotspot parameter to ON. However, hot row optimization does not take effect on the UPDATE statement.
- If you execute the
MySQL [(none)]> set global hotspot=ON;statement to set the hotspot parameter to ON when the rds_ic_reduce_hint_enable parameter is set to ON, the following error is returned:
ERROR 3042 (HY000): Variable 'hotspot' cannot be enabled because 'rds_ic_reduce_hint_enable' is enabled/disabled
- You can execute the following statement to check the parameter settings:
show variables like "hotspot%";
The following result is returned.
+------------------------------+-------+ | Variable_name | Value | +------------------------------+-------+ | hotspot | OFF | | hotspot_for_autocommit | OFF | | hotspot_lock_type | OFF | | hotspot_update_max_wait_time | 100 | +------------------------------+-------+
- You can execute the following statement to view the statistics of hot row optimization:
show global status like 'Group_update%';
- You can execute the following statements for a performance test.
- Create a table
CREATE TABLE sbtest(id INT UNSIGNED NOT NULL, c BIGINT UNSIGNED NOT NULL) PRIMARY KEY (id)
- Perform an UPDATE operation
UPDATE COMMIT_ON_SUCCESS ROLLBACK_ON_FAIL TARGET_AFFECT_ROW 1 sbtest SET c=c+1 WHERE id = 1
- Create a table
- Test tool: SysBench
- Scenarios and test results
- Scenario 1: One hot row and an 8-core CPU
In this scenario, an 8-core CPU is used and a large number of concurrent requests are sent to update a hot row. After hot row optimization is enabled, the performance of the database is improved by about 64 times.
- Scenario 2: One hot row and a 32-core CPU
In this scenario, a 32-core CPU is used and concurrent requests are sent to update a hot row. After hot row optimization is enabled, the peak queries per second (QPS) is increased by 63 times. When the number of concurrent requests reaches 8,000, the performance of the database is improved by 46 times.
- Scenario 3: Eight hot rows and a 32-core CPU
In this scenario, a 32-core CPU is used and concurrent requests are sent to update eight hot rows. After hot row optimization is enabled, the maximum of QPS is improved by 20 times.
If the number of concurrent requests reaches 16,000, database failures occur and no results are returned when hot row optimization is disabled. However, when hot row optimization is enabled, row data is updated as expected and QPS remains stable.
- Scenario 1: One hot row and an 8-core CPU