Problem
The CPU utilization of an ApsaraDB RDS for MySQL/MariaDB instance is high, sometimes reaching 100%.
Causes
This topic describes two common causes of 100% CPU utilization and their solutions: high application load (high QPS) and high query costs due to slow SQL queries. When an application submits query or data modification operations, the system performs a large number of logical read operations. Logical I/O refers to the number of data rows that must be accessed in tables to execute a query. As a result, the system consumes significant CPU resources to maintain the consistency of data read from storage into memory. In MySQL, high CPU utilization is often caused by costly slow SQL queries that access many table rows.
This topic does not cover cases where high CPU utilization is caused by a large number of row lock conflicts, lock waits, or background tasks, as these scenarios are rare.
-
High application load (high QPS):
-
Characteristics: The instance experiences high QPS with simple, efficient queries that offer little room for optimization.
-
Symptoms: No slow SQL queries are observed, or they are not the primary cause. The QPS and CPU utilization trends are consistent.
-
Common scenarios: This issue is common in optimized online transaction processing (OLTP) systems, such as order systems, popular web applications with high read rates, and third-party stress tests, such as with Sysbench.
-
-
High query cost due to slow SQL queries (accessing many rows):
-
Characteristics: The instance QPS is not high, but query execution is inefficient and requires scans of large amounts of table data. There is significant potential for optimization.
-
Symptoms: Slow SQL queries are present. The QPS and CPU utilization trends do not match.
-
Cause analysis: Low query efficiency forces the system to access large amounts of data to return results, leading to high average logical I/O. Therefore, even when QPS is not high (for example, on a website with low traffic), the CPU utilization of the instance can still be high.
-
Solutions
Choose a solution based on your specific situation.
High application load (high QPS)
When high CPU utilization is caused by a high application load, there is often little room for SQL optimization. Address this issue by adjusting the application architecture or instance configuration:
-
Upgrade the instance type to increase CPU resources. For more information, see Change configuration.
-
Add a read-only instance and route queries that are not sensitive to data consistency, such as product category or train schedule lookups, to the read-only instance. This reduces the load on the primary instance. For more information, see Create a MySQL read-only instance.
-
Use PolarDB-X, a cloud-native distributed database, to automatically shard data and distribute the query load across multiple RDS instances.
-
Use ApsaraDB for Memcache or Tair to serve frequently accessed query results from the cache, which reduces the load on the RDS instance.
-
For applications with relatively static data, high query repetition, and result sets smaller than 1 MB, consider enabling the query cache.
NoteYou must test whether enabling the query cache benefits your application. For more information about the settings, see Set and use the query cache for ApsaraDB RDS for MySQL.
-
Periodically archive historical data. Use sharding or partitioning to reduce the amount of data accessed by queries. Optimize your queries to reduce execution costs and improve application scalability.
High query cost due to slow SQL queries
To resolve this issue, identify inefficient queries, improve their execution efficiency, and reduce their execution costs.
-
Identify inefficient queries by using one of the following methods:
-
Run the following SQL statements to view currently running queries.
show processlist; show full processlist;The system returns output similar to the following:
mysql> show processlist; +----------+-------+-------------------+---------+---------+-------+--------------+----------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +----------+-------+-------------------+---------+---------+-------+--------------+----------------------------------------------+ |101031643 | jacky | xxx.xxx.xxx.xx:xx | my_test | Query | 10760 | Sending data | select b.* from perf_test_no_idx_01 a, ... | |117731567 | jacky | xxx.xxx.xxx.xx:xx | my_test | Query | 10760 | Sending data | select b.* from perf_test_no_idx_01 a, ... | |134298793 | jacky | xxx.xxx.xxx.xx:xx | my_test | Query | 10760 | Sending data | select b.* from perf_test_no_idx_01 a, ... | |134384670 | jacky | xxx.xxx.xxx.xx:xx | my_test | Query | 0 | Init | show processlist | |234891284 | jacky | xxx.xxx.xxx.xx:xx | my_test | Query | 10760 | Sending data | select b.* from perf_test_no_idx_01 a, ... | |235125098 | jacky | xxx.xxx.xxx.xx:xx | my_test | Query | 10760 | Sending data | select b.* from perf_test_no_idx_01 a, ... | |235200576 | jacky | xxx.xxx.xxx.xx:xx | my_test | Query | 10760 | Sending data | select b.* from perf_test_no_idx_01 a, ... | |235633985 | jacky | xxx.xxx.xxx.xx:xx | my_test | Query | 10760 | Sending data | select b.* from perf_test_no_idx_01 a, ... | |235887773 | jacky | xxx.xxx.xxx.xx:xx | my_test | Query | 10760 | Sending data | select b.* from perf_test_no_idx_01 a, ... | |251990394 | jacky | xxx.xxx.xxx.xx:xx | my_test | Query | 10760 | Sending data | select b.* from perf_test_no_idx_01 a, ... | |252662718 | jacky | xxx.xxx.xxx.xx:xx | my_test | Query | 10760 | Sending data | select b.* from perf_test_no_idx_01 a, ... | +----------+-------+-------------------+---------+---------+-------+--------------+----------------------------------------------+ 11 rows in set (0.00 sec)Query sessions that have been running for a long time and are in states such as Sending data, Copying to tmp table, Copying to tmp table on disk, Sorting result, or Using filesort may have performance issues.
-
In scenarios where high QPS causes high CPU utilization, queries usually execute quickly. It may be difficult to capture currently running queries by using the
show processlist;command or by viewing instance sessions. However, you can run the following SQL statement to analyze a query.explain [$SQL]Note[$SQL] represents the SQL query with performance issues.
-
You can run a command like
kill [$ID];to terminate long-running sessions. For more information, see Terminate a session on an ApsaraDB RDS for MySQL instance.Note[$ID] represents the session ID corresponding to the query.
-
-
Use Database Autonomy Service (DAS) to view currently running queries:
Log on to the DAS console.
In the navigation pane on the left, click .
Find the target instance and click the instance ID to open the instance details page.
In the left-side navigation pane, click Instance Sessions.
-
Click the query text in the SQL column to display the full query and its execution plan.
-
Continuously monitor SQL execution performance by using SQL Insight and Auditing. In the left-side navigation pane of the instance details page, choose Autonomy Service > SQL Insight and Auditing. You can view detailed information such as SQL execution time and the number of rows scanned to locate CPU-intensive SQL statements.
-
-
After identifying queries that need optimization, use SQL Diagnostics in the Data Management Service (DMS) console to get optimization suggestions. To troubleshoot historical issues, diagnostic reports are also invaluable:
-
Log on to a database by using the DMS console.
-
At the top of the page, click SQL Window, and select the desired database.
-
Paste the query statement into the SQL window and click SQL Diagnostics to get optimization suggestions. The diagnostic result includes the SQL statement, the execution plan (which shows the query type, tables involved, and number of rows scanned), and index diagnostic suggestions that provide DDL statements that you can run directly.
-
-
Apply the recommended optimizations. For example, after adding a suggested index, verify that the query execution cost is significantly lower.
Short-term mitigation measures
If CPU utilization remains high and you cannot optimize slow SQL queries quickly, use the following measures to relieve CPU pressure.
Control the concurrency of slow SQL queries by using SQL throttling
-
Log on to the ApsaraDB RDS console. In the Instances list, click the target instance.
-
In the left-side navigation pane, choose Autonomy Service > One-Click Diagnosis.
-
Click the Session Management tab.
-
Click the SQL Throttling button.
-
Click Create Throttling Rule, and then configure the throttling mode, rule, database, maximum concurrency, and throttling duration.
Temporarily upgrade the instance type
If killing sessions and SQL throttling fail to relieve CPU pressure, you can temporarily upgrade the instance type to increase CPU resources. On the Basic Information page of the instance, in the Configuration Information section, click Change configuration to upgrade the instance. For more information, see Change configuration.
More information
Features for troubleshooting performance issues
Data Management Service (DMS) provides several features to help you troubleshoot and resolve instance performance issues. The diagnostic report is the most effective tool for troubleshooting performance problems on ApsaraDB RDS for MySQL and ApsaraDB RDS for MariaDB instances. Whatever the cause of a performance issue, always start by reviewing the diagnostic report, especially the SQL optimization, session list, and slow SQL query summary sections.
Preventing 100% CPU utilization
The following guidelines can help you prevent CPU utilization from reaching 100%:
-
Set up CPU utilization alarms to ensure you have sufficient CPU headroom on your instance.
-
During application design and development, consider query optimization. Follow general MySQL optimization principles to reduce query logical I/O and improve application scalability.
-
Before you launch new features or modules, perform stress testing with production data.
-
Before you launch new features or modules, run regression tests with production data.
System resource algorithm
The following simplified model illustrates the relationship between system resources, the execution cost of SQL statements, and queries per second (QPS):
-
Condition: The application model is constant, which means the application code is not modified.
-
avg_lgc_io: The average logical I/O required to execute each query.
-
total_lgc_io: The total amount of logical I/O that the instance's CPU resources can handle per unit of time.
-
Formula:
total_lgc_io = avg_lgc_io × QPS, which meansTotal CPU resources per unit of time = Average logical I/O per query × Number of queries per unit of time.
Related documents
Use Database Autonomy Service to resolve high CPU utilization on ApsaraDB RDS for MySQL instances
Applicable versions
-
ApsaraDB RDS for MySQL
-
ApsaraDB RDS for MariaDB