The recovery time objective (RTO) is the amount of time that is required to recover from a failure. It is a core metric that is used to measure the high availability of a database system. AliSQL, developed based on MySQL Community Edition, significantly optimizes the crash recovery process, accelerates instance startup, reduces the RTO, and ensures rapid service recovery.
Optimization overview
Optimization item | Optimized object or process | Optimization effectiveness |
Tablespace check | The time for a normal instance startup is reduced by 73%. The time to recover from downtime is reduced by more than 95%. The memory usage during a startup is reduced by more than 81%. | |
Parallel buffer pool initialization | The time to initialize a buffer pool is reduced by more than 80%. | |
Redo log scanning and application | The time to apply redo logs is reduced by more than 80%. | |
Undo log record structure | The time to recover large transactions with 1 million records is approximately 0 seconds, which can be ignored. | |
Uncommitted transaction rollback | The time to roll back large transactions with 1 million records is approximately 0 seconds, which can be ignored. | |
Corrupted data scanning and locating | The time to recover a general query log table of 10 GB is approximately 0 seconds, which can be ignored. |
Massive table startup optimization
Tablespace checking is fundamental to crash recovery. If a large number of tables, such as 1 million tables, exist in your RDS instance, the instance startup and crash recovery processes slow down and a large number of memory resources are occupied. AliSQL has optimized core processes, such as file scanning, data dictionary reading, and table object construction. This significantly reduces the time that is required to scan tables in scenarios in which a large number of tables are involved. The following table describes the optimization effectiveness.
Scenario | MySQL Community Edition (before optimization) | AliSQL (after optimization) | Improvement |
Normal startup time | 90.7 seconds | 24.1 seconds | Reduced by more than 73% |
Crash recovery time | 521.7 seconds | 25 seconds | Reduced by more than 95% |
Memory usage during startup | High | Reduced by more than 81% | Significantly optimized |
How to enable
Minor engine version: If your RDS instance runs MySQL 8.0, the minor engine version must be later than or equal to 20240731.
Key parameter:
loose_fetch_raw_tablespace_at_startup. This parameter is enabled by default. If this parameter is enabled, only necessary metadata is loaded during the instance startup, and the loading of redundant information is delayed. This reduces memory usage and startup time.
Buffer pool initialization acceleration
The buffer pool is the core memory structure of InnoDB. To initialize the buffer pool, memory allocation and management structure construction are required. MySQL Community Edition supports parallel buffer pool initialization at the instance level. However, concurrency bottlenecks still exist in lock structures and global statistics updates. AliSQL optimizes the parallel buffer pool initialization, which eliminates bottlenecks in the parallel initialization process and significantly reduces the time for the buffer pool initialization of instances with large memory capacity. The following table describes the optimization effectiveness.
Scenario | MySQL Community Edition (before optimization) | AliSQL (after optimization) | Improvement |
Time to initialize a buffer pool of 512 GB | 19.7 seconds | 3.8 seconds | Reduced by more than 80% |
How to enable
Minor engine version:
If your RDS instance runs MySQL 8.0, the minor engine version must be later than or equal to 20220530.
If your RDS instance runs MySQL 5.7, the minor engine version must be later than or equal to 20230531.
Key parameter:
innodb_buffer_pool_init_optimize. This parameter is enabled by default. If this parameter is enabled, the system automatically optimizes the parallel buffer pool initialization to eliminate lock and synchronization bottlenecks.
Parallel redo log application
Redo logs are write-ahead logging (WAL) logs in MySQL databases to ensure data consistency. Redo log scanning and application are critical steps during crash recovery. MySQL Community Edition only supports single-threaded scanning and serialized application. As a result, several minutes are required to recover a large number of redo logs, such as 4 GB of redo logs. AliSQL introduces a pipeline mechanism to parallelize redo log scanning and application, which significantly improves the recovery speed. The following table describes the optimization effectiveness.
Scenario | MySQL Community Edition (before optimization) | AliSQL (after optimization) | Improvement |
Time to apply active redo logs of 4 GB | 118.7 seconds | 22.6 seconds | Reduced by more than 80% |
Application speed | Baseline speed | Increased by more than 500% | Significantly improved |
How to enable
Minor engine version: If your RDS instance runs MySQL 8.0, the minor engine version must be later than or equal to 20241231.
Key parameter: innodb_parallel_redo_threads. Default value: 1. This parameter is disabled by default. You must manually set the
innodb_parallel_redo_threadsparameter to a value greater than 2, such as4. This parameter specifies the number of threads for parallel redo Log application. You can configure this parameter to accelerate redo log scanning and application.
Fast transaction recovery
During crash recovery, after redo logs are applied, MySQL needs to use undo logs to recover table locks for active transactions. MySQL Community Edition needs to scan all undo log records row by row to obtain information about the tables that are managed by the transactions. As a result, the time that is required to recover large transactions, such as transactions with millions of records, is significantly increased. AliSQL extends the structure of undo log records to implement fast undo log locating and scanning. This significantly reduces the time to recover transactions. The following table describes the optimization effectiveness.
Scenario | MySQL Community Edition (before optimization) | AliSQL (after optimization) | Improvement |
Time to recover large transactions with 1 million records | 6.8 seconds | Approximately 0 seconds, which can be ignored | Completely optimized |
How to enable
Minor engine version:
If your RDS instance runs MySQL 8.0, the minor engine version must be later than or equal to 20230630.
If your RDS instance runs MySQL 5.7, the minor engine version must be later than or equal to 20230831.
Key parameter: innodb_trx_resurrect_table_lock_accelerate. Default value: OFF. This parameter is disabled by default. You must manually set the
innodb_trx_resurrect_table_lock_accelerateparameter to ON. If this parameter is enabled, the system enables the fast undo log scanning mechanism to accelerate large transaction recovery.
Asynchronous transaction rollback
MySQL uses internal XA transactions to ensure consistency between binary logs and InnoDB. To commit a transaction, the system first executes the InnoDB prepare phase and keeps the transaction prepared, writes the prepared transaction to binary logs, and then commit the transaction. During crash recovery, the commit or rollback of uncommitted transactions is determined by binary logs. Transactions in the prepared state are considered uncommitted transactions. In MySQL Community Edition, uncommitted transaction rollback is a synchronous operation. This causes the time to recover large transactions, such as transactions with millions of records, to be significantly increased. AliSQL changes uncommitted transaction rollback to asynchronous, which does not block instance startup and allows the instance to quickly recover. The following table describes the optimization effectiveness.
Scenario | MySQL Community Edition (before optimization) | AliSQL (after optimization) | Improvement |
Time to roll back large transactions with 1 million records | 100 seconds | Approximately 0 seconds, which can be ignored | Completely optimized |
How to enable
Minor engine version:
If your RDS instance runs MySQL 8.0, the minor engine version must be later than or equal to 20220530.
If your RDS instance runs MySQL 5.7, the minor engine version must be later than or equal to 20230531.
Key parameter:
async_binlog_recovery. This parameter is enabled by default. If this parameter is enabled, the system enables the mechanism to asynchronously roll back uncommitted transactions to accelerate instance startup.
Optimization of general query log recovery
In TABLE mode, MySQL general query logs record user operations in a CSV table. If corruption is detected on this table during crash recovery, MySQL Community Edition performs a full table scan to locate and rebuild the corrupted data. This causes the time to recover large data volumes, such as 10 GB, to be significantly increased. AliSQL uses techniques such as reverse scanning and file truncation to quickly recover the corrupted log table, which significantly reduces the instance startup delay and accelerates instance recovery. The following table describes the optimization effectiveness.
Scenario | MySQL Community Edition (before optimization) | AliSQL (after optimization) | Improvement |
Time to recover a general query log table of 10 GB | 270 seconds | Approximately 0 seconds, which can be ignored | Completely optimized |
How to enable
Minor engine version:
If your RDS instance runs MySQL 8.0, the minor engine version must be later than or equal to 20241130.
If your RDS instance runs MySQL 5.7, the minor engine version must be later than or equal to 20241130.
Key parameter:
reverse_repair_general_log_on_boot. This parameter is enabled by default. If this parameter is enabled, the system enables reverse scanning and file truncation to accelerate the recovery of general query log tables.
Other optimizations
AliSQL also optimizes the process to verify data in the doublewrite buffer. We recommend that you update the minor engine version to the latest version.