All Products
Search
Document Center

ApsaraDB RDS:Best practices for RTO optimization in AliSQL

Last Updated:Apr 16, 2025

The recovery time objective (RTO) is the amount of time that is required to recover from a failure. It is a core metric that is used to measure the high availability of a database system. AliSQL, developed based on MySQL Community Edition, significantly optimizes the crash recovery process, accelerates instance startup, reduces the RTO, and ensures rapid service recovery.

Optimization overview

Optimization item

Optimized object or process

Optimization effectiveness

Massive table startup optimization

Tablespace check

The time for a normal instance startup is reduced by 73%. The time to recover from downtime is reduced by more than 95%. The memory usage during a startup is reduced by more than 81%.

Buffer pool initialization acceleration

Parallel buffer pool initialization

The time to initialize a buffer pool is reduced by more than 80%.

Parallel redo log application

Redo log scanning and application

The time to apply redo logs is reduced by more than 80%.

Fast transaction recovery

Undo log record structure

The time to recover large transactions with 1 million records is approximately 0 seconds, which can be ignored.

Asynchronous transaction rollback

Uncommitted transaction rollback

The time to roll back large transactions with 1 million records is approximately 0 seconds, which can be ignored.

Optimization of general query log recovery

Corrupted data scanning and locating

The time to recover a general query log table of 10 GB is approximately 0 seconds, which can be ignored.

Massive table startup optimization

Tablespace checking is fundamental to crash recovery. If a large number of tables, such as 1 million tables, exist in your RDS instance, the instance startup and crash recovery processes slow down and a large number of memory resources are occupied. AliSQL has optimized core processes, such as file scanning, data dictionary reading, and table object construction. This significantly reduces the time that is required to scan tables in scenarios in which a large number of tables are involved. The following table describes the optimization effectiveness.

Scenario

MySQL Community Edition (before optimization)

AliSQL (after optimization)

Improvement

Normal startup time

90.7 seconds

24.1 seconds

Reduced by more than 73%

Crash recovery time

521.7 seconds

25 seconds

Reduced by more than 95%

Memory usage during startup

High

Reduced by more than 81%

Significantly optimized

How to enable

  • Minor engine version: If your RDS instance runs MySQL 8.0, the minor engine version must be later than or equal to 20240731.

  • Key parameter: loose_fetch_raw_tablespace_at_startup. This parameter is enabled by default. If this parameter is enabled, only necessary metadata is loaded during the instance startup, and the loading of redundant information is delayed. This reduces memory usage and startup time.

Buffer pool initialization acceleration

The buffer pool is the core memory structure of InnoDB. To initialize the buffer pool, memory allocation and management structure construction are required. MySQL Community Edition supports parallel buffer pool initialization at the instance level. However, concurrency bottlenecks still exist in lock structures and global statistics updates. AliSQL optimizes the parallel buffer pool initialization, which eliminates bottlenecks in the parallel initialization process and significantly reduces the time for the buffer pool initialization of instances with large memory capacity. The following table describes the optimization effectiveness.

Scenario

MySQL Community Edition (before optimization)

AliSQL (after optimization)

Improvement

Time to initialize a buffer pool of 512 GB

19.7 seconds

3.8 seconds

Reduced by more than 80%

How to enable

  • Minor engine version:

    • If your RDS instance runs MySQL 8.0, the minor engine version must be later than or equal to 20220530.

    • If your RDS instance runs MySQL 5.7, the minor engine version must be later than or equal to 20230531.

  • Key parameter: innodb_buffer_pool_init_optimize. This parameter is enabled by default. If this parameter is enabled, the system automatically optimizes the parallel buffer pool initialization to eliminate lock and synchronization bottlenecks.

Parallel redo log application

Redo logs are write-ahead logging (WAL) logs in MySQL databases to ensure data consistency. Redo log scanning and application are critical steps during crash recovery. MySQL Community Edition only supports single-threaded scanning and serialized application. As a result, several minutes are required to recover a large number of redo logs, such as 4 GB of redo logs. AliSQL introduces a pipeline mechanism to parallelize redo log scanning and application, which significantly improves the recovery speed. The following table describes the optimization effectiveness.

Scenario

MySQL Community Edition (before optimization)

AliSQL (after optimization)

Improvement

Time to apply active redo logs of 4 GB

118.7 seconds

22.6 seconds

Reduced by more than 80%

Application speed

Baseline speed

Increased by more than 500%

Significantly improved

How to enable

  • Minor engine version: If your RDS instance runs MySQL 8.0, the minor engine version must be later than or equal to 20241231.

  • Key parameter: innodb_parallel_redo_threads. Default value: 1. This parameter is disabled by default. You must manually set the innodb_parallel_redo_threads parameter to a value greater than 2, such as 4. This parameter specifies the number of threads for parallel redo Log application. You can configure this parameter to accelerate redo log scanning and application.

Fast transaction recovery

During crash recovery, after redo logs are applied, MySQL needs to use undo logs to recover table locks for active transactions. MySQL Community Edition needs to scan all undo log records row by row to obtain information about the tables that are managed by the transactions. As a result, the time that is required to recover large transactions, such as transactions with millions of records, is significantly increased. AliSQL extends the structure of undo log records to implement fast undo log locating and scanning. This significantly reduces the time to recover transactions. The following table describes the optimization effectiveness.

Scenario

MySQL Community Edition (before optimization)

AliSQL (after optimization)

Improvement

Time to recover large transactions with 1 million records

6.8 seconds

Approximately 0 seconds, which can be ignored

Completely optimized

How to enable

  • Minor engine version:

    • If your RDS instance runs MySQL 8.0, the minor engine version must be later than or equal to 20230630.

    • If your RDS instance runs MySQL 5.7, the minor engine version must be later than or equal to 20230831.

  • Key parameter: innodb_trx_resurrect_table_lock_accelerate. Default value: OFF. This parameter is disabled by default. You must manually set the innodb_trx_resurrect_table_lock_accelerate parameter to ON. If this parameter is enabled, the system enables the fast undo log scanning mechanism to accelerate large transaction recovery.

Asynchronous transaction rollback

MySQL uses internal XA transactions to ensure consistency between binary logs and InnoDB. To commit a transaction, the system first executes the InnoDB prepare phase and keeps the transaction prepared, writes the prepared transaction to binary logs, and then commit the transaction. During crash recovery, the commit or rollback of uncommitted transactions is determined by binary logs. Transactions in the prepared state are considered uncommitted transactions. In MySQL Community Edition, uncommitted transaction rollback is a synchronous operation. This causes the time to recover large transactions, such as transactions with millions of records, to be significantly increased. AliSQL changes uncommitted transaction rollback to asynchronous, which does not block instance startup and allows the instance to quickly recover. The following table describes the optimization effectiveness.

Scenario

MySQL Community Edition (before optimization)

AliSQL (after optimization)

Improvement

Time to roll back large transactions with 1 million records

100 seconds

Approximately 0 seconds, which can be ignored

Completely optimized

How to enable

  • Minor engine version:

    • If your RDS instance runs MySQL 8.0, the minor engine version must be later than or equal to 20220530.

    • If your RDS instance runs MySQL 5.7, the minor engine version must be later than or equal to 20230531.

  • Key parameter: async_binlog_recovery. This parameter is enabled by default. If this parameter is enabled, the system enables the mechanism to asynchronously roll back uncommitted transactions to accelerate instance startup.

Optimization of general query log recovery

In TABLE mode, MySQL general query logs record user operations in a CSV table. If corruption is detected on this table during crash recovery, MySQL Community Edition performs a full table scan to locate and rebuild the corrupted data. This causes the time to recover large data volumes, such as 10 GB, to be significantly increased. AliSQL uses techniques such as reverse scanning and file truncation to quickly recover the corrupted log table, which significantly reduces the instance startup delay and accelerates instance recovery. The following table describes the optimization effectiveness.

Scenario

MySQL Community Edition (before optimization)

AliSQL (after optimization)

Improvement

Time to recover a general query log table of 10 GB

270 seconds

Approximately 0 seconds, which can be ignored

Completely optimized

How to enable

  • Minor engine version:

    • If your RDS instance runs MySQL 8.0, the minor engine version must be later than or equal to 20241130.

    • If your RDS instance runs MySQL 5.7, the minor engine version must be later than or equal to 20241130.

  • Key parameter: reverse_repair_general_log_on_boot. This parameter is enabled by default. If this parameter is enabled, the system enables reverse scanning and file truncation to accelerate the recovery of general query log tables.

Other optimizations

AliSQL also optimizes the process to verify data in the doublewrite buffer. We recommend that you update the minor engine version to the latest version.