This topic describes how to troubleshoot the issues that cause high I/O on an ApsaraDB RDS for MySQL instance. The I/O performance of your RDS instance varies based on three factors. These factors are the storage media, the database engine architecture, and the SQL statements that are executed to scan or modify a specific amount of data.

Storage media

ApsaraDB RDS for MySQL supports the following types of storage media:

  • Local SSD

    Local SSDs deliver the lowest I/O latency among the supported types of storage media. However, the storage capacity that is provided by local SSDs is limited. If the local SSDs that are configured for your RDS instance cannot accommodate an increasing amount of data, you must migrate the instance to another suitable host. The migration is time-consuming. In addition, the migration triggers a switchover of your workloads, and the switchover causes a transient connection error.

  • Cloud SSDs

    Cloud SSDs include standard SSDs and enhanced SSDs. Cloud SSDs use a distributed storage mechanism and deliver higher cost-effectiveness and larger storage capacity than local SSDs. In addition, cloud SSDs allow you to scale the storage capacity without the need to migrate data.

For more information about the different types of storage media, see Storage types.

Troubleshoot high I/O caused by high throughput

  • Symptom

    If your application frequently initiates requests to update, delete, and insert data on tables, the I/O of your RDS instance significantly increases due to the data reads and the flushes of dirty pages. This applies if the tables contain a large number of indexes or large fields.

    You can log on to the ApsaraDB RDS console and choose Autonomy Service > Dashboard in the left-side navigation pane. Then, on the Performance Trends tab, you can view the read and write loads on your RDS instance.

  • Solution

    We recommend that you reduce the read and write frequency, upgrade your RDS instance, or optimize the settings of the parameters that are used to flush dirty pages. The following parameters are used to flush dirty pages:

    • innodb_max_dirty_pages_pct: the percentage of dirty pages that are allowed in the buffer pool. Default value: 75.
    • innodb_max_dirty_pages_pct_lwm: the low water mark for the percentage of dirty pages that are allowed in the buffer pool. If the percentage of dirty pages in the buffer pool exceeds the low water mark, ApsaraDB RDS flushes dirty pages to the disk. This ensures a proper percentage of dirty pages in the buffer pool. The default value 0 specifies to disable the low water mark.
      Note The value of the innodb_max_dirty_pages_pct_lwm parameter must be less than or equal to the value of the innodb_max_dirty_pages_pct parameter. Otherwise, ApsaraDB RDS sets the innodb_max_dirty_pages_pct_lwm parameter to the value of the innodb_max_dirty_pages_pct parameter.
    • innodb_io_capacity: the maximum number of I/O operations that are allowed by InnoDB per second for each background task. The value of this parameter affects the speed at which ApsaraDB RDS flushes dirty pages to the disk. The value of this parameter also affects the speed at which ApsaraDB RDS writes data to the buffer pool. The default value of this parameter is 20000.
    • innodb_io_capacity_max: the maximum number of I/O operations that are allowed by InnoDB per second for each background task. This parameter takes effect only when the flushes of dirty pages are outdated. The value of this parameter is greater than the value of the innodb_io_capacity parameter. The default value of the innodb_io_capacity_max parameter is 40000.

Troubleshoot high I/O caused by temporary tables

  • Symptom

    If the temporary directory is large, ApsaraDB RDS may have created large temporary tables due to operations such as the sorting and deduplication of slow SQL statements. This increases the I/O of your RDS instance. In addition, data writes to temporary tables also increase the I/O of your RDS instance.

    You can log on to the ApsaraDB RDS console and choose Autonomy Service > Dashboard in the left-side navigation pane. Then, on the Performance Trends tab, you can view the size of the tmp or other directory for your RDS instance.

  • Solution

    We recommend that you optimize the SQL statements that you want to execute. This allows you to prevent slow SQL statements. The autonomy service of ApsaraDB RDS supports SQL optimization. For more information, see SQL optimization.

Troubleshoot high I/O caused by cold data reads

  • Symptom

    If the data that is queried or modified by using SQL statements cannot be hit in the buffer pool, ApsaraDB RDS needs to read the data from the disk. This may significantly increase the I/O of your RDS instance.

    You can log on to the ApsaraDB RDS console and choose Autonomy Service > Dashboard in the left-side navigation pane. Then, on the Performance Trends tab, you can view the buffer pool hit ratio of your RDS instance.

  • Solution

    Redesign the cache policy based on your business scenario. Otherwise, upgrade your RDS instance.

Troubleshoot high I/O caused by DDL statements

  • Symptom

    If your application initiates data definition language (DDL) statements, ApsaraDB RDS may rebuild the tablespaces of your RDS instance. During the rebuild process, ApsaraDB RDS scans each row of each table in the tablespaces, creates indexes that are used to sort data, and flushes the dirty pages generated from new tables. All these operations significantly increase the I/O of your RDS instance. If your application initiates requests to delete large tables, the I/O of your RDS instance may also increase.

    You can log on to the ApsaraDB RDS console and choose Monitoring and Alerts in the left-side navigation pane. Then, on the Standard Monitoring tab, you can click Resource Monitoring to view the disk usage and input/output operations per second (IOPS) of your RDS instance.

  • Solution

    Use the Purge Large File Asynchronously feature to delete large files. This feature is provided by AliSQL. AliSQL is a MySQL branch that is developed by Alibaba Cloud. For more information, see Purge Large File Asynchronously.

Troubleshoot high I/O caused by binary log writes from large transactions

  • Symptom

    A transaction writes log records into binary log files only when it is committed. If your application runs a large transaction, the transaction may write a few dozen GB of data into binary log files. For example, the transaction contains a DELETE statement that is used to delete a large number of rows. When these binary log files are flushed to the disk, the I/O of your RDS instance significantly increases.

  • Solution

    We recommend that you split each large transaction that you want to run. This allows you to reduce the flushes of dirty pages to the disk.

Appendix: Introduction to the InnoDB I/O system

InnoDB uses an independent I/O system to read and write data pages. If the data page that is requested by an SQL statement cannot be hit in the buffer pool, physical I/O operations are performed to read and write data to the disk.

  • Operations to read data pages

    The underlying read interface is called based on synchronous I/O to read data pages.

  • Operations to write data pages

    Use the flushes of dirty pages as an example. Background I/O threads are called based on asynchronous I/O to asynchronously flush dirty pages to the disk.

In addition to I/O operations on common data files, a number of other operations may also significantly increase the I/O of your RDS instance. These operations include the operations to write redo logs, undo logs, and binary logs, the operations to sort temporary tables, and the operations to rebuild tablespaces due to DDL statements.