Problem description

When I use my ApsaraDB RDS for MySQL instance or ApsaraDB RDS for MariaDB instance, the CPU utilization is excessively high and even reaches 100%.

Causes

When an application submits query or modification requests, the system performs a large number of logical read operations, which indicates high logical I/O on a large number of rows in the requested table. As a result, the system consumes a large number of CPU resources to maintain the consistency of data that is read from storage to memory. This topic describes the causes and solutions to high queries per second (QPS) and slow SQL queries that cause high CPU utilization of 100%. In most cases, slow SQL queries cause high query execution costs because the requested table contains a large number of rows. As a result, CPU utilization becomes high.

Note High CPU utilization may also be caused by a large number of row lock conflicts, row lock waits, or backend tasks. This topic does not cover these cases.
  • Cause 1: Heavy application load, which is indicated by a high QPS
    • Performance: The instance QPS is high, query operations are simple, and the query execution efficiency is high. This leaves low potential for optimization.
    • Characteristics: No slow SQL queries occur or slow SQL queries are not the root cause of heavy loads. The QPS and CPU utilization curves match.
    • Common scenarios: optimized online transaction systems, such as an order system, popular web applications that have high read rates, and third-party stress testing tools, such as sysbench.
  • Cause 2: High query execution costs due to slow SQL queries
    • Performance: The instance QPS is low, the query execution efficiency is low, and a large amount of data needs to be scanned during queries. This leaves a high potential for optimization.
    • Characteristics: Slow SQL queries occur. The QPS and CPU utilization curves do not match.
    • Analysis: A large amount of data needs to be scanned to obtain expected results because SQL queries are inefficient. This leads to a high average logical I/O. As a result, the CPU utilization is high even if the QPS is low, such as in the case of a website that is not frequently visited.

Solutions

You can select a solution based on your business requirements.

Solution to Cause 1

We recommend that you optimize the application architecture or upgrade instance types.

  • Upgrade the instance type of your RDS instance to increase CPU resources. For more information, see Change the specifications of an ApsaraDB RDS for MySQL instance.
  • Add read-only RDS instances to offload queries that are not sensitive to data consistency from the primary RDS instance. Examples of the queries include commodity type queries and train number queries. For more information, see Create a read-only ApsaraDB RDS for MySQL instance.
  • Use PolarDB-X to implement automatic sharding of databases and tables to offload the queries from your RDS instance to multiple RDS instances.
  • Use ApsaraDB for Memcache or ApsaraDB for Redis to offload queries from your RDS instance. In this case, frequently queried data is read from cache.
  • Enable query cache for applications that query static data, run identical queries, and receive responses with result sets that are less than 1 MB.
    Note You need to test whether query cache offloads your queries. For more information, see Query Cache.
  • Archive historical data at regular intervals. Use sharding or partitioning to reduce the amount of data that is accessed by queries. Optimize queries to reduce query execution costs and improve application scalability.

Solution to Cause 2

Identify and optimize inefficient queries to improve the execution efficiency and reduce execution costs.

  1. Identify inefficient queries.
    • Execute SQL statements to query running queries.
      SHOW PROCESSLIST;
      SHOW FULL PROCESSLIST;

      The following output is returned.

      result

      Sessions in which slow SQL queries occur and the value of the State parameter is Sending data, Copying to tmp table, Copying to tmp table on disk, Sorting result, or Using filesort may have inefficient queries.

      • If the high CPU utilization is caused by a high QPS, queries are quickly executed. You cannot view running queries by executing the SHOW PROCESSLIST; statement or from sessions. In this case, you can execute the following statement:
        explain [$SQL]
        Note [$SQL] indicates an inefficient SQL statement.
      • You can execute the kill [$ID]; statement to terminate a long-running session. For more information about how to terminate a session, see How do I terminate sessions on an ApsaraDB RDS for MySQL instance? For more information about how to manage long-running sessions, see How do I manage long-running queries in an ApsaraDB RDS for MySQL instance?
        Note [$ID] indicates the session ID that corresponds to the SQL statement.
    • Query running queries in the Database Autonomy Service (DAS) console.
      1. Log on to the DAS console.
      2. Find the required instance and choose Performance > Instance Sessions in the Actions column.
      3. Click the query text in the SQL column to view the complete query settings and the execution plan.
  2. After you determine the queries that need to be optimized, use SQL Diagnostics to obtain optimization suggestions. You can also troubleshoot historical issues of high CPU utilization that occurred on instances based on diagnostic results.
    1. Log on to your RDS instance in the Data Management (DMS) console. For more information, see Log on to a database instance.
    2. In the upper part, click SQL Console and select a database.
    3. Paste the SQL statement to the SQLConsole tab and click SQL Diagnostics. Optimization suggestions are displayed.
  3. You can optimize the SQL statements based on the optimization suggestions. For example, you can add an index to significantly reduce query execution costs.

Additional information

Performance issue troubleshooting

DMS provides various features to troubleshoot instance performance issues. The instance diagnostics report feature is optimal for troubleshooting the performance issues of your RDS instance. If a performance issue occurs, we recommend that you first view the instance diagnostics report, particularly the SQL optimization suggestions, sessions, and slow query logs in the report. DMS provides the following features:

Guidelines on how to avoid 100% CPU utilization

You can use the following methods:

  • Configure CPU utilization alerts to help you monitor CPU utilization and plan workloads based on your business requirements.
  • During application design and development, optimize MySQL queries based on general rules and techniques to reduce the logical I/O of queries and improve application scalability.
  • Before you launch a new feature or module, use production data to perform stress testing in a production environment.
  • Before you launch a new feature or module, we recommend that you use production data to perform regression testing.

System resource algorithms

In this section, a simplified model is used to show the relationship among system resources, query execution costs of SQL statements, and QPS.

  • Condition: The application configuration does not change.
  • avg_lgc_io: the average logical I/O that is required to run each query.
  • total_lgc_io: the total logical I/O that can be processed by using the CPU resources of an instance per unit of time.
  • Formula: total_lgc_io = avg_lgc_io × QPS. CPU resources consumed per unit of time = Average execution cost for each query × Number of queries per unit of time

References

How do I use CloudDBA to reduce the CPU utilization of my ApsaraDB RDS for MySQL instance?

Application scope

ApsaraDB RDS for MySQL and ApsaraDB RDS for MariaDB