ApsaraDB RDS for MySQL provides various performance metrics and supports custom views. You can select metrics to create custom views based on your business requirements. ApsaraDB RDS for MySQL provides powerful diagnostic capabilities to detect events on your RDS instance at the earliest opportunity and automatically diagnose the events to analyze root causes and provide suggestions. ApsaraDB RDS for MySQL also allows you to select a time period during which you want to perform manual diagnostics. This helps you understand the status of your RDS instance.
Description
The standard monitoring feature of ApsaraDB RDS for MySQL is upgraded and integrated with the performance trend feature of Database Autonomy Service (DAS) to provide more performance monitoring metrics and support custom views. You can select metrics to create custom views based on your business requirements. For more information, see Performance trends. ApsaraDB RDS for MySQL provides powerful diagnostic capabilities to detect events on your RDS instance at the earliest opportunity and automatically diagnose the events to analyze root causes and provide suggestions. ApsaraDB RDS for MySQL also allows you to select a time period during which you want to perform manual diagnostics.
- For more information about the performance parameters that are related to the metrics, see Performance parameters.
- DAS is a stable, secure, and efficient cloud service that is designed based on machine learning and expert experience to implement automated perception, recovery, optimization, O&M, and security assurance for databases. DAS helps prevent service failures that are caused by manual operations. For more information, see Overview of DAS.
Procedure
View standard monitoring information
- Access RDS Instances, select a region at the top, and then click the ID of the target RDS instance.
- In the left-side navigation pane, click Monitoring and Alerts.
- On the Standard Monitoring tab, click the Standard View or Custom View tab and perform the following operations based on your business requirements:
- On the Standard View tab, select a time range to view the trend of metrics and statistics about various events. You can click Add Trend Comparison to view the comparison of performance trends in a specific time range based on the metrics.Note When you specify a time range, the end time must be later than the start time, and the interval between the start time and the end time cannot exceed 30 days.
- Event statistics section: You can click View Details to go to the Performance Events tab. On this tab, you can view the details about exceptions, optimization events, and auto scaling events. The details include the events that are scheduled, being executed, and executed.
- Trend charts section:
- In addition to the classic view, the system provides diagnostic views for common issues. This allows you to quickly identify the causes of issues based on the trends of key metrics in the diagnostic views.
The following diagnostic views are provided: Memory OOM diagnosis, Read-only instance delay diagnosis, Diagnosis of Space Full Problem, CPU jitter diagnosis, and Large transaction recognition diagnosis. You can select a diagnostic view based on your business requirements. For more information, see Use diagnostic views.
Note You can click theicon to the right of a monitoring item to view the metrics of the monitoring item.
- If you select Classic View, you can click More Metrics to select the monitoring items that you want the system to display.
- You can select the severity levels of events. If events of the selected severity levels are detected, the system displays the detected events in the MySQL CPU Utilization/Memory Usage and Session trend charts.
If you select Classic View, you can click the event in a trend chart and view the diagnostic result in the event list.
- You can specify a time range and click Diagnose in the trend chart of a monitoring item to analyze the metrics in the selected time range.
- You can click Common Causes in the trend chart of a monitoring item to view the common causes of the exceptions in the monitoring item.
- You can click Details in the trend chart of a monitoring item to expand the chart. You can also change the time range to view the change trends of the monitoring item in a specific time range.
- In addition to the classic view, the system provides diagnostic views for common issues. This allows you to quickly identify the causes of issues based on the trends of key metrics in the diagnostic views.
- On the Custom View tab, click Add Monitoring Dashboard to create a custom dashboard to display the metrics that you want to monitor.
- You can click Add Node and Metrics to select the RDS instances and metrics for the dashboard.
- You can set the Display Mode parameter to Merge Display or Separate Display.
- If you select Merge Display, multiple metrics are displayed in the same trend chart.
- If you select Separate Display, each metric is displayed in a separate trend chart.
- You can configure the Chart Layout parameter to specify the number of trend charts of metrics in each row.
- You can click Details in the trend chart of a metric to expand the chart. You can also change the time range to view the change trends of the metric in a specific time range.
- On the Standard View tab, select a time range to view the trend of metrics and statistics about various events.
View legacy monitoring information
- Access RDS Instances, select a region at the top, and then click the ID of the target RDS instance.
- In the left-side navigation pane, click Monitoring and Alerts.
- On the Legacy Monitoring tab, click Resource Monitoring, Engine Monitoring, or Deploy Monitoring. Then, specify a time range. If you use RDS Cluster Edition, you can also select an cluster ID or node ID. The following table describes the monitoring items. Note When you specify a time range, the end time must be later than the start time, and the interval between the start time and the end time cannot exceed 30 days.
Tab Monitoring item Description Resource Monitoring CPU Utilization and Memory Usage (%) The CPU utilization and memory usage of the RDS instance. This excludes the CPU utilization and memory usage of the operating system. Disk Space (MB) The disk usage of the RDS instance. The disk usage provides the following information: - Disk Space Used
- Data Space Used
- Log Space Used
- Temporary File Space Used
- System File Space Used
Unit: MB.
IOPS The IOPS of the RDS instance. Total Connections The number of active connections to the RDS instance and the total number of connections to the RDS instance. Network Traffic (KB) The volumes of inbound and outbound traffic of the RDS instance per second. Unit: KB. Engine Monitoring TPS/QPS The average number of transactions per second (TPS) and the average number of SQL statements that are executed per second. InnoDB Buffer Pool Read Hit Ratio, Usage Ratio, and Dirty Block Ratio (%) The read hit ratio, usage, and dirty ratio of the InnoDB buffer pool. InnoDB Read/Write Volume (KB) The amount of data that InnoDB reads and writes per second. Unit: KB. InnoDB Buffer Pool Read/Write Frequency The number of read and write operations that InnoDB performs per second. InnoDB Log Reads, Writes, and fsync() Operations The number of physical writes to log files by InnoDB per second, the number of log write requests by InnoDB per second, and the number of fsync writes to log files by InnoDB per second. Temporary Tables Automatically Created on Hard Disk when MySQL Statements Are Executed The number of temporary tables that are automatically created on the hard disk when SQL statements are executed. MySQL_COMDML The number of SQL statements that are executed per second. ApsaraDB RDS supports the following SQL statements: - Insert
- Delete
- Insert_Select
- Replace
- Replace_Select
- Select
- Update
MySQL_RowDML The number of operations that InnoDB performs per second. The following items are included: - The number of physical writes to log files per second
- The number of rows on which InnoDB performs operations per second. This includes the number of rows that are read from InnoDB tables per second, the number of rows that are updated in InnoDB tables per second, the number of rows that are deleted from InnoDB tables per second, and the number of rows that are inserted into InnoDB tables per second.
MyISAM Read/Write Frequency The number of reads from the buffer pool by MyISAM per second, the number of writes to the buffer pool by MyISAM per second, the number of reads from the hard disk by MyISAM per second, and the number of writes to the hard disk by MyISAM per second. MyISAM Key Buffer Read/Write/Usage Ratio (%) The read hit ratio, write hit ratio, and usage of the MyISAM key buffer per second. MySQL Thread Status The number of active threads and the number of connected threads. Note The number of active threads may be greater than the number of connected threads. This is because threads may be used for other workloads than connection establishments.Redo Log Size Written by InnoDB per Second (Bytes/s) The amount of log data that is written to the redo log files per second. Unit: bytes. MySQL_ROW_LOCK The information about row-level locks in InnoDB. The monitoring item consists of the following metrics: - innodb_row_lock_waits: the average number of times the row-level locks have been waited for.
- innodb_row_lock_time: the maximum time of a row-level lock that has been waited for. Unit: milliseconds.
- innodb_row_lock_avg: the average time of row-level locks that have been waited for. Unit: milliseconds.
MySQL_SelectScan The number of full table scans. Deploy Monitoring Replication Thread Status of Secondary Instances The status of the threads that are used to replicate data to the secondary RDS instance. - I/O Thread Status of Secondary Instances: The value 1 indicates that the thread is normal, and the value 0 indicates that the thread is lost.
- SQL Thread Status of Secondary Instances: The value 1 indicates that the thread is normal, and the value 0 indicates that the thread is lost.
Replication Latency of Secondary Instances (s) The latency of data replication to the secondary RDS instance. Unit: seconds.
Use diagnostic views
Memory OOM diagnosis

- Memory Usage:
- If the memory usage slowly and continuously increases for a long period of time, such as more than seven days, and the usage of the InnoDB buffer pool remains unchanged, a memory leak may occur.
- If the memory usage suddenly increases and the usage of the InnoDB buffer pool remains unchanged, your services may encounter traffic spikes.
- If both the memory usage and the usage of the InnoDB buffer pool increase, InnoDB caches as it is accessed and works as expected.
- Resident Memory: shows the memory capacity.
- Open files, Temp File Size, Temp Disk Tables, and Sort Rows: show memory consumption.
- Check business logs to identify the cause of the increase in memory usage.
- Increase the memory capacity and enable the SQL Explorer and Audit feature. This way, you can identify the cause of the increase in memory usage based on the points in time at which SQL statements are executed. For more information, see Use the SQL Explorer and Audit feature on an ApsaraDB RDS for MySQL instance.
Read-only instance delay diagnosis

- Active Session: shows whether metadata locks exist and cause congestions.
In most cases, if a large amount of data is queried, metadata locks cannot be obtained when DDL statements are executed. As a result, the DDL statements block other sessions and cause an accumulation of connections.
- Resident Memory: shows the memory capacity.
- DML Rows Processed, Pages Requested, DML/DDL Operations, and Temp Disk Space Used: show common business metrics.
- Replication Delay: shows the replication latency.
Diagnosis of Space Full Problem

You can analyze and resolve exhausted storage issues based on the trend charts of monitoring items that are displayed in the Diagnosis of Space Full Problem view.
- Data files (user_data_size): You can use the storage analysis feature to check the storage usage of each database and table. Then, you can expand the storage capacity or delete unnecessary data based on your business requirements. For more information about the storage analysis feature, see Use the storage analysis feature for an ApsaraDB RDS for MySQL instance. For more information about how to handle data files, see What do I do if an ApsaraDB RDS for MySQL instance is in the Locked state because its storage capacity is exhausted by data files?
- Temporary files (temp_file_size): Temporary tables may be generated when you execute SQL statements to sort and group data or associate tables. Binary log cache files are generated before large transactions are committed. These tables and files occupy storage. For more information about how to handle temporary files, see What do I do if an ApsaraDB RDS for MySQL instance is in the Locked state because its storage capacity is exhausted by temporary files?
- Binary logs (binlog_size): Binary logs are quickly generated by large transactions. These logs occupy storage. For more information about how to handle binary logs, see What do I do if the storage capacity of an ApsaraDB RDS for MySQL instance is exhausted by binary log files?. Note If binary logs are subscribed, the binary logs may not be deleted at the earliest opportunity and occupy storage.
- Undo logs (undo_log_size): In most cases, undo logs cannot be cleared due to long-running queries. You need to check whether queries that are executed for a long period of time and are not completed exist. Note In MySQL 5.6 and earlier versions, no separate tablespaces are provided for undo logs.
- Slow query logs (slowlog_size): If slow query logs occupy a large amount of storage, we recommend that you execute the
TRUNCATE
statement to clear slow query logs during off-peak hours.Note TheTRUNCATE
statement is supported in MySQL 5.7 with a minor engine version of 20210630 or later and in MySQL 8.0 with a minor engine version of 20210930 or later. - General query logs (general_log_size): By default, the general_log parameter is disabled in MySQL. If you enable the general_log parameter, extra storage is occupied. We recommend that you execute the
TRUNCATE
statement to clear general query logs during off-peak hours.Note TheTRUNCATE
statement is supported in MySQL 5.7 with a minor engine version of 20210630 or later and in MySQL 8.0 with a minor engine version of 20210930 or later.
CPU jitter diagnosis

- Business monitoring items:
- Page Request: In most cases, the trend of the number of requests in the buffer pool fluctuates based on CPU utilization.
- Rows Processed: shows the relationship between CPU utilization and the number of rows that are processed by the system. You can check whether the number of processed rows suddenly increases when CPU utilization changes based on this monitoring item.
- Queries: shows the types of SQL statements that are executed when CPU utilization changes.
- Connection-related monitoring items:
Thread Running: High concurrency leads to high CPU utilization. Metadata locks or row locks cause an accumulation of connections, which affects CPU utilization.
- The Page Request or Rows Processed monitoring item changes. In this case, you can select the period of time during which CPU utilization changes and click Diagnose to obtain details about the issue.
- The number of active connections increases. In this case, you need to resolve the issue from the business side.
Large transaction recognition diagnosis

- Threads Connected, Temp File Size, and Binlog space: show whether large transactions exist. If one of the following events occurs, a large transaction exists in the database:
- Active sessions accumulate.
- The storage that is occupied by temporary files increases and then decreases.
- The storage that is occupied by temporary files decreases, but the storage that is occupied by binary logs increases.
- Rows Processed, Logical Page Write, and Queries per Second: show the type of a large transaction.
For example, if a few queries are executed but a large number of rows are deleted, a large transaction that deletes data exists.
- When large transactions exist, the storage that is occupied by temporary tables (binlog cache) gradually increases and then remains stable.
- When the storage that is occupied by temporary tables remains stable, the storage that is occupied by binary logs increases. Binary logs are globally written in serial mode. As a result, other transactions are blocked, which causes the accumulation of connections.
- If the RDS instance runs RDS High-availability Edition, the statements that are executed to check the status of primary and secondary RDS instances for primary/secondary switchovers are blocked. As a result, a primary/second switchover is directly performed on the RDS instance.
We recommend that you split large transactions into small transactions and separately execute the small transactions. For example, you can add a WHERE
clause to the DELETE
statement to limit the volume of data that is deleted at a time and split the deletion operation into smaller operations.