High I/O on an ApsaraDB RDS for MySQL instance typically stems from five root causes: high throughput, temporary tables, cold data reads, DDL statements, and large-transaction binary log writes. Use this guide to identify the cause and take targeted action.
Common causes at a glance
| Cause | Where to look in the console | Key indicator |
|---|---|---|
| High throughput | Autonomy Service > Dashboard > Performance Trends | Read/write load |
| Temporary tables | Autonomy Service > Dashboard > Performance Trends | tmp directory size |
| Cold data reads | Autonomy Service > Dashboard > Performance Trends | Buffer pool hit ratio |
| DDL statements | Monitoring and Alerts > Standard Monitoring > Standard View | Disk usage and IOPS |
| Large-transaction binary log writes | — | Sudden large increase in binary log volume |
How InnoDB handles I/O
InnoDB uses an independent I/O system to read and write data pages. When an SQL statement requests a data page not in the buffer pool, InnoDB reads it from disk using synchronous I/O. Background threads use asynchronous I/O to flush dirty pages to disk.
Beyond data file I/O, several other operations can significantly increase I/O: redo log writes, undo log writes, binary log writes, temporary table sorting, and DDL tablespace rebuilds. Each of the five causes below maps to one or more of these mechanisms.
Troubleshoot high I/O caused by high throughput
Symptom
Frequent UPDATE, DELETE, and INSERT operations on tables with many indexes or large fields drive I/O in two ways. First, RDS reads data pages into the buffer pool. Then, background threads flush the resulting dirty pages to disk. The combined read and write pressure elevates I/O.
To confirm, go to Autonomy Service > Dashboard in the left-side navigation pane of the ApsaraDB RDS console, open the Performance Trends tab, and check read/write load metrics.
Solution
Reduce read/write frequency, upgrade the instance type, or tune the parameters that control dirty page flushing.
| Parameter | Default | Description |
|---|---|---|
innodb_max_dirty_pages_pct | 75 | Maximum percentage of dirty pages allowed in the buffer pool |
innodb_max_dirty_pages_pct_lwm | 0 (disabled) | Low water mark for dirty pages; when exceeded, RDS proactively flushes dirty pages to maintain headroom in the buffer pool |
innodb_io_capacity | 20000 | Maximum I/O operations per second available to InnoDB background tasks; controls the speed of dirty page flushing and buffer pool writes |
innodb_io_capacity_max | 40000 | Upper bound on I/O operations per second; takes effect only when dirty page flushing falls behind; must be greater than innodb_io_capacity |
innodb_max_dirty_pages_pct_lwm must be less than or equal to innodb_max_dirty_pages_pct. If you set it higher, RDS automatically resets it to the value of innodb_max_dirty_pages_pct.
Troubleshoot high I/O caused by temporary tables
Symptom
Slow SQL statements that involve sorting or deduplication cause InnoDB to create large temporary tables on disk. Both creating and writing to these temporary tables increase I/O. A large tmp directory is a reliable indicator.
To confirm, go to Autonomy Service > Dashboard in the left-side navigation pane of the ApsaraDB RDS console, open the Performance Trends tab, and check the size of the tmp directory.
Solution
Optimize the SQL statements that generate temporary tables. Database Autonomy Service (DAS) provides built-in SQL optimization to help identify and fix slow queries. For details, see SQL optimization.
Troubleshoot high I/O caused by cold data reads
Symptom
When the data pages needed by a query or DML statement are not in the buffer pool, InnoDB reads them from disk. A low buffer pool hit ratio means most requests go to disk, significantly increasing I/O.
To confirm, go to Autonomy Service > Dashboard in the left-side navigation pane of the ApsaraDB RDS console, open the Performance Trends tab, and check the buffer pool hit ratio.
Solution
Redesign your caching strategy based on your workload's access patterns. Alternatively, upgrade your RDS instance.
Troubleshoot high I/O caused by DDL statements
Symptom
DDL statements trigger a full tablespace rebuild. InnoDB scans every row, creates sort indexes, and flushes dirty pages for the new table structure. Deleting large tables also generates significant I/O. These operations can cause sharp spikes in both disk usage and IOPS.
To confirm, go to Monitoring and Alerts in the left-side navigation pane of the ApsaraDB RDS console, click Standard View on the Standard Monitoring tab, and check disk usage and IOPS.
Solution
Use the Purge Large File Asynchronously feature to delete large files without blocking I/O. This feature is part of AliSQL, Alibaba Cloud's MySQL branch. For details, see Purge Large File Asynchronously.
Troubleshoot high I/O caused by binary log writes from large transactions
Symptom
InnoDB writes to binary log files only at transaction commit. A single large transaction—for example, a DELETE that removes a large number of rows—can generate a few dozen GB of binary log data. Flushing that volume to disk in one burst causes a sharp I/O spike.
Solution
Split large transactions into smaller batches. Smaller commits spread binary log flushes over time, reducing the flushes of dirty pages to the disk and preventing the burst I/O that a single large commit produces.