Diagnose and resolve high I/O issues for MySQL - ApsaraDB RDS

High I/O on an ApsaraDB RDS for MySQL instance typically stems from five root causes: high throughput, temporary tables, cold data reads, DDL statements, and large-transaction binary log writes. Use this guide to identify the cause and take targeted action.

Common causes at a glance

Cause	Where to look in the console	Key indicator
High throughput	Autonomy Service > Dashboard > Performance Trends	Read/write load
Temporary tables	Autonomy Service > Dashboard > Performance Trends	`tmp` directory size
Cold data reads	Autonomy Service > Dashboard > Performance Trends	Buffer pool hit ratio
DDL statements	Monitoring and Alerts > Standard Monitoring > Standard View	Disk usage and IOPS
Large-transaction binary log writes	—	Sudden large increase in binary log volume

How InnoDB handles I/O

InnoDB uses an independent I/O system to read and write data pages. When an SQL statement requests a data page not in the buffer pool, InnoDB reads it from disk using synchronous I/O. Background threads use asynchronous I/O to flush dirty pages to disk.

Beyond data file I/O, several other operations can significantly increase I/O: redo log writes, undo log writes, binary log writes, temporary table sorting, and DDL tablespace rebuilds. Each of the five causes below maps to one or more of these mechanisms.

Troubleshoot high I/O caused by high throughput

Symptom

Frequent UPDATE, DELETE, and INSERT operations on tables with many indexes or large fields drive I/O in two ways. First, RDS reads data pages into the buffer pool. Then, background threads flush the resulting dirty pages to disk. The combined read and write pressure elevates I/O.

To confirm, go to Autonomy Service > Dashboard in the left-side navigation pane of the ApsaraDB RDS console, open the Performance Trends tab, and check read/write load metrics.

Solution

Reduce read/write frequency, upgrade the instance type, or tune the parameters that control dirty page flushing.

Parameter	Default	Description
`innodb_max_dirty_pages_pct`	75	Maximum percentage of dirty pages allowed in the buffer pool
`innodb_max_dirty_pages_pct_lwm`	0 (disabled)	Low water mark for dirty pages; when exceeded, RDS proactively flushes dirty pages to maintain headroom in the buffer pool
`innodb_io_capacity`	20000	Maximum I/O operations per second available to InnoDB background tasks; controls the speed of dirty page flushing and buffer pool writes
`innodb_io_capacity_max`	40000	Upper bound on I/O operations per second; takes effect only when dirty page flushing falls behind; must be greater than `innodb_io_capacity`

Note

innodb_max_dirty_pages_pct_lwm must be less than or equal to innodb_max_dirty_pages_pct. If you set it higher, RDS automatically resets it to the value of innodb_max_dirty_pages_pct.

Troubleshoot high I/O caused by temporary tables

Symptom

Slow SQL statements that involve sorting or deduplication cause InnoDB to create large temporary tables on disk. Both creating and writing to these temporary tables increase I/O. A large tmp directory is a reliable indicator.

To confirm, go to Autonomy Service > Dashboard in the left-side navigation pane of the ApsaraDB RDS console, open the Performance Trends tab, and check the size of the tmp directory.

Solution

Optimize the SQL statements that generate temporary tables. Database Autonomy Service (DAS) provides built-in SQL optimization to help identify and fix slow queries. For details, see SQL optimization.

Troubleshoot high I/O caused by cold data reads

Symptom

When the data pages needed by a query or DML statement are not in the buffer pool, InnoDB reads them from disk. A low buffer pool hit ratio means most requests go to disk, significantly increasing I/O.

To confirm, go to Autonomy Service > Dashboard in the left-side navigation pane of the ApsaraDB RDS console, open the Performance Trends tab, and check the buffer pool hit ratio.

Solution

Redesign your caching strategy based on your workload's access patterns. Alternatively, upgrade your RDS instance.

Troubleshoot high I/O caused by DDL statements

Symptom

DDL statements trigger a full tablespace rebuild. InnoDB scans every row, creates sort indexes, and flushes dirty pages for the new table structure. Deleting large tables also generates significant I/O. These operations can cause sharp spikes in both disk usage and IOPS.

To confirm, go to Monitoring and Alerts in the left-side navigation pane of the ApsaraDB RDS console, click Standard View on the Standard Monitoring tab, and check disk usage and IOPS.

Solution

Use the Purge Large File Asynchronously feature to delete large files without blocking I/O. This feature is part of AliSQL, Alibaba Cloud's MySQL branch. For details, see Purge Large File Asynchronously.

Troubleshoot high I/O caused by binary log writes from large transactions

Symptom

InnoDB writes to binary log files only at transaction commit. A single large transaction—for example, a DELETE that removes a large number of rows—can generate a few dozen GB of binary log data. Flushing that volume to disk in one burst causes a sharp I/O spike.

Solution

Split large transactions into smaller batches. Smaller commits spread binary log flushes over time, reducing the flushes of dirty pages to the disk and preventing the burst I/O that a single large commit produces.