This topic describes how to use the hot row optimization feature to improve the performance of database systems.

Prerequisites

The version of the PolarDB cluster is PolarDB for MySQL 5.6 and the minor kernel version is 20200616 or later.

Note For information about how to upgrade a minor kernel version, see Upgrade the revision version.

Limits

The hot row optimization feature is not applicable in the following scenarios:

  • Hot rows are stored in a partitioned table.
  • The table that stores hot rows is associated with a trigger.

Considerations

Alibaba Cloud uses canary releases to provide the hot row optimization feature. To use this feature, .

Background information

In database systems, data in some rows are added, deleted, modified, and queried at a high frequency. These data rows are called hot rows. When a transaction updates data in a row, the row is locked until the transaction is committed or rolled back. Other transactions must wait. Transactions cannot update data in the same row at the same time. Requests to update a single hot row are processed in serial. Therefore, traditional database sharding policies are ineffective in improving database performance.

In e-commerce platforms, limited sales and flash sales are frequently used for promotion. In these promotion scenarios, a large number of requests to update a single hot row are sent to the backend database system in a short period. This results in row lock contention and queues for acquiring row locks. This degrades the system performance. If an update request waits for a long time before the update request is processed, services are affected.

In this case, you cannot achieve the expected low latency by improving your hardware configurations. To resolve this issue, PolarDB optimizes the kernel layer of databases. The optimized kernel layer can automatically identify the requests to update hot rows, and divide the requests to update the same hot row within a specified period into groups. Different groups of the requests are processed in parallel in pipelines. This improves system performance.

Optimizations

  • Convert serial processing to parallel processing in pipelines

    Parallel processing is the easiest way to improve the performance of a database system. However, it is difficult for the system to process all the requests of updating the same hot row in parallel. PolarDB introduces pipelines to process these update requests in parallel to the greatest extent.

    1

    The SQL statements that are executed to update hot rows are attached with the autocommit or commit_on_success label. The optimized MySQL kernel layer can automatically identify update requests that are attached with the autocommit or commit_on_success label. The kernel layer can also implement hashing on the collected update requests based on primary keys or unique keys. Then, the update requests that are allocated to the same bucket are processed and committed in groups and batches based on the time of arriving at the bucket.

    To allow these update requests to be processed in pipelines, two execution units are required to divide the update requests into two groups. When the update requests in the first group are collected and are ready to commit, the update requests in the second group start to be collected. When the update requests in the second group are collected and are ready to commit, the update requests in the first group have been committed. The next batch of the update requests start to be collected in the first group. This ensures that the update requests can be processed in parallel.

    Multi-core CPUs are widely used. This pipelined processing method can make full use of hardware resources, optimize CPU utilization, and improve the parallel processing capability of database systems. This maximizes the throughput of the database systems.

  • Eliminate the need to wait for acquiring row locks

    A data row must be locked when the data row is updated. This ensures the logical consistency of data. If the requests of acquiring row locks are not immediately processed and wait in a queue, the request processing latency is increased and deadlock detection is triggered. This consumes additional resources.

    The requests to update the same data row are grouped in chronological order. The first update request in a group serves as a leader. The leader reads data in the data row and locks the data row. The other update requests in the same group serve as followers. If the data row is locked by the leader, the followers can immediately acquire the row locks.

    This method reduces the number of times for acquiring row locks and saves time, which improves the performance of the entire database system.

  • Reduce the traversals of B-tree indexes

    MySQL manages data based on B-tree indexes. During each query, MySQL traverses all the indexes to locate the required data row. The traversal requires a long time if the table is large and has multiple index levels.

    If the grouping mechanism is used, only the leader of each group needs to traverse the indexes to locate the required data row. Then, the leader caches the updated data row to the memory. The followers in the same group can read the data row from the memory after they acquire the row locks, and do not need to traverse the indexes again.

    This reduces the number of index traversals and saves time.

Use the hot row optimization feature

  • Before you use the hot row optimization feature to improve the performance of a database system, you must specify the parameters that are described in the following table.
    Parameter Description
    hotspot Specifies whether to enable the hot row optimization feature. Valid values: ON and OFF. Default value: OFF.
    hotspot_for_autocommit Specifies whether to enable the hot row optimization feature to optimize the UPDATE statement that is in auto-commit mode. Valid values: ON and OFF. Default value: OFF.
    Note You can specify this parameter only when the hotspot parameter is set to ON.
    hotspot_update_max_wait_time The time that is required for the leader in a group to wait for the followers to join the group in the process of request grouping. Unit: microseconds. Default value: 100 μs.
    hotspot_lock_type Specifies whether to use a new type of row locks when the system process requests to update the same data row in groups and batches. Valid values: ON and OFF. Default value: OFF.
    Note If this parameter is set to ON, the requests to update the same hot row do not need to wait for acquiring row locks. This improves the performance of the database system.
    Note
    • To use the hot row optimization feature, you must enable the binary logging feature. Therefore, if you try to set the hotspot parameter to ON when the binary logging feature is disabled, the system returns an error.
      If you try to execute the MySQL [(none)]> set global hotspot=ON; statement to set the hotspot parameter to ON when the binary logging feature is disabled, the system returns the following error:
      ERROR 3042 (HY000):
      Variable 'hotspot' cannot be enabled because 'bin logging' is enabled/disabled
    • When the binary logging feature is enabled globally but disabled for sessions, you can set the hotspot parameter to ON. However, hot row optimization does not take effect on the UPDATE statement.
    • If you try to execute the MySQL [(none)]> set global hotspot=ON; statement to set the hotspot parameter to ON when the rds_ic_reduce_hint_enable parameter is set to ON, the system returns the following error:
      ERROR 3042 (HY000): Variable 'hotspot' cannot be enabled because 'rds_ic_reduce_hint_enable' is enabled/disabled
  • You can execute the following statement to check the parameter settings:
    show variables like "hotspot%";

    Sample responses

    +------------------------------+-------+
    |
    Variable_name                | Value |
    +------------------------------+-------+
    |
    hotspot                      | OFF   |
    |
    hotspot_for_autocommit       | OFF   |
    |
    hotspot_lock_type            | OFF   |
    |
    hotspot_update_max_wait_time | 100   |
    +------------------------------+-------+
  • You can execute the following statement to view the statistics of hot row optimization:
    show global status like 'Group_update%';

Performance testing

  • You can execute the following CREATE TABLE and UPDATE statements for performance testing.
    • CREATE TABLE statement:
      CREATE TABLE sbtest(id INT UNSIGNED NOT NULL, c BIGINT UNSIGNED NOT NULL) PRIMARY KEY (id)
    • UPDATE statement:
      UPDATE COMMIT_ON_SUCCESS ROLLBACK_ON_FAIL QUEUE_ON_PK 1 TARGET_AFFECT_ROW 1 sbtest SET c=c+1 WHERE id = 1
  • SysBench is used to test the performance of a database system in the following scenarios:
    • Scenario 1: One hot row and 8-core CPUs

      In this scenario, 8-core CPUs are used and a large number of concurrent requests are sent to update a single hot row. After hot row optimization is enabled, the performance of the database system is improved by about 64 times.

      2
    • Scenario 2: One hot row and 32-core CPUs

      In this scenario, 32-core CPUs are used and concurrent requests are sent to update a single hot row. After hot row optimization is enabled, the maximum of queries per second (QPS) is improved by 63 times. When the number of concurrent requests reaches 8,000, the performance of the database system is improved by 46 times.

      2
    • Scenario 3: Eight hot rows and 32-core CPUs

      In this scenario, 32-core CPUs are used and concurrent requests are sent to update eight hot rows. After hot row optimization is enabled, the maximum of QPS is improved by 20 times.

      When the number of concurrent requests reaches 16,000, database failures occur and no results are returned if hot row optimization is disabled. However, if hot row optimization is enabled, row data is updated as expected and QPS remains stable.

      4