All Products
Search
Document Center

ApsaraDB RDS:Troubleshoot the issues that cause high CPU utilization of an ApsaraDB RDS for SQL Server instance

Last Updated:Jun 14, 2023

This topic describes how to view the CPU utilization and troubleshoot the issues that cause high CPU utilization of an ApsaraDB RDS for SQL Server instance. High CPU utilization may affect query performance.

View CPU utilization

You can use one of the following features to view the CPU utilization of your RDS instance in the ApsaraDB RDS console:

  • Monitoring and alerting

    Go to the Monitoring and Alerts page. Click the Legacy Monitoring tab. Then, click Resource Monitoring to view the CPU utilization of your RDS instance.

    监控
  • Autonomy service

    Your RDS instance does not run SQL Server 2008 R2 with cloud disks.

    In the left-side navigation pane, choose Autonomy Service > Performance Optimization. On the Performance Insight tab of the page that appears, view the CPU utilization of your RDS instance.

    CloudDBA
Note

The shared instance family supports the reuse of CPU resources. When you select the shared instance family, the performance of your RDS instance may be limited due to the reuse of CPU resources even if the CPU utilization of the instance is not high. Therefore, we recommend that you select the dedicated instance family or the dedicated host instance family. This way, you can ensure the stable and high performance of your RDS instance. For more information about the shared instance family, see Instance families.

Analyze CPU metrics

  • Cause

    In most cases, a sudden increase in CPU utilization is caused by the following issues:

    • The number of query requests suddenly increases. For example, the number of query requests increases due to a sudden increase in workloads or due to the cache penetration at the data caching layer.

    • The CPU overhead for query requests suddenly increases. For example, if new query requests are processed by using an inefficient method or if the execution plans of some query statements change, the CPU overhead increases.

    • The frequency at which the system compiles execution plans for query statements significantly increases. For example, if the pressure on the cache increases, the number of execution plans that are cached and the cache hit ratio significantly decrease. In this case, the frequency at which the system compiles execution plans for SQL statements significantly increases. As a result, the overall CPU overhead of your RDS instance significantly increases.

  • Analysis

    Analyze the following metrics to troubleshoot an increase in CPU utilization.

    • QPS

      If the value of the QPS metric increases at the same rate as CPU utilization, the increase in CPU utilization is caused by an increase in the number of query requests. This means that the reason for the increase in CPU utilization does not lie in your RDS instance. You must analyze your application to troubleshoot the increase in CPU utilization.

    • Page_Lookups/sec

      The Page_Lookups/sec metric indicates the cumulative number of pages that are logically read per second to process query requests. In most cases, the value of this metric becomes high because the system uses an inefficient method to execute query statements. If the value of this metric is high, the CPU overhead for query requests is high. If the value of the Page_Lookups/sec metric increases at the same rate as CPU utilization but the value of the QPS metric remains relatively stable, the CPU overhead for query statements increases. In this case, you must identify the types of query statements that contribute to the increase in CPU utilization. Then, you can optimize these types of query statements to reduce CPU utilization.

    • Sqlcompliations

      The Sqlcompliations metric indicates the number of compile operations per second for query requests. If the value of this metric increases at the same rate as CPU utilization but the value of the QPS metric remains relatively stable, the increase in CPU utilization may be caused by the CPU overhead that is required to compile execution plans for query requests. You can further check the Cache_Object_Counts and Cache_Pages metrics, which are related to the number of execution plans that are cached. If the values of these metrics significantly decrease, the increase in CPU utilization may be caused by significantly high pressure on the cache. In these cases, an effective solution is to increase the memory capacity of your RDS instance.

  • Case

    The following figure shows a sample case.

    案例

    The CPU utilization statistics show that the increase in CPU utilization occurs from 09:10 to 09:20 and from 09:30 to 09:40. However, the value of the QPS metric does not increase during these periods of time. The value of the QPS metric increases after 09:40. Therefore, the increase in CPU utilization is not caused by an increase in the number of query requests.

    The value of the Sqlcompliations metric does not increase during these periods of time, and the absolute value of this metric is low. Therefore, the increase in CPU utilization is not caused by the CPU overhead that is required to compile execution plans for query statements.

    The value of the Page_Lookups/sec metric increases at the same rate as CPU utilization during these periods of time. Therefore, the increase in CPU utilization may be caused by the high CPU overhead that is required to process some query requests during these periods of time.

    You must identify the query statements that require high CPU overhead during these periods of time. In addition, if the value of the Page_Lookups/sec metric increases, CPU utilization increases. Some query statements may require high CPU overhead to be executed. However, these query statements may require medium CPU overhead for logical read operations. Therefore, you must analyze the query statements that are executed during these periods to troubleshoot the increase in CPU utilization.

Analyze active sessions

  • Cause

    The most common cause of a sudden increase in CPU utilization is the inefficiency of the method that is used to execute query statements. You can use the Average Active Sessions (AAS) metric of the autonomy service to identify and analyze the query statements that are executed by using an inefficient method.

  • Analysis

    The system checks active sessions every 10 seconds and records the SQL statements, query hash values, execution plans, and wait events of active query requests. In most cases, when a query statement that requires high CPU overhead is running, the value in the Wait Category column of the query statement on the Waits tab is CPU.

    The SQL Hash column on the SQL tab displays the hash values that are generated after SQL statements are structured based on parameters. The hash values are used to mark SQL statements that have identical structures. This way, the system can classify and aggregate SQL statements based on the structures of the SQL statements. You can query the latest statistics of an SQL statement from the sys.dm_exec_query_stats system view based on the value in the query_hash column of the SQL statement.

    We recommend that you perform the following operations:

    • Click the hyperlink in the SQL Hash column on the SQL tab to view the AAS statistics of the SQL statement.

    • Click Analyze in the Execution Plan column on the SQL tab to view the execution plan of the SQL statement. You can also view the optimization suggestions that are generated by autonomy services.

    The preceding optimization suggestions are suitable for SQL statements with simple structures. If the SQL statements on your RDS instance have complex structures, we recommend that you further analyze and test the execution plans of these SQL statements based on the preceding optimization suggestions.

    For more information about the AAS metric, see Use the performance insight feature for an ApsaraDB RDS for SQL Server instance.

Analyze top N SQL statements

  • Cause

    You can use the ASS metric of the autonomy service to identify the SQL statements that cause an increase in CPU utilization during a specific period of time. This feature does not provide information such as the execution frequency, average CPU overhead, and overall CPU overhead of various SQL statements. If you want to optimize the overall CPU resource efficiency of your RDS instance, we recommend that you obtain the details about the SQL statements that consume the most CPU resources.

  • Analysis

    SQL Server can automatically aggregate information about different objects, such as SQL statements and stored procedures. SQL Server also provides system views such as sys.dm_exec_query_stats and sys.dm_exec_procedure_stats. You can query the information about different objects from the system views. Then, you can identify the SQL statements that consume the most resources. The resources can be of various types.

    Note

    The TOP SQL report and TOP Objects report in CloudDBA and the top N query reports in SQL Server Management Studio (SSMS) are also based on system views. These reports are easy to use but are less flexible than system views.

Optimize parameter settings

The maximum degree of parallelism (MAXDOP) is used to limit the maximum number of active threads that can be simultaneously used by a single query request. The number of active threads represents the number of CPU cores. If execution plans with high degrees of parallelism are compiled for SQL statements that require high CPU overhead, the time that is required to execute these SQL statements may significantly decrease. However, the CPU overhead per unit of time significantly increases. Therefore, we recommend that you specify a proper MAXDOP to balance the query speed and the CPU utilization based on the following suggestions:

  • If the concurrency of query requests is high and the CPU overhead of most of the supported SQL statements is low, set the MAXDOP to a small value. The MAXDOP can be as low as 1, which specifies a zero degree of parallelism.

  • If the concurrency of query requests is low and the CPU overhead of some of the supported SQL statements is high, set the MAXDOP to a large value. We recommend that the MAXDOP do not exceed 50% or 25% of the maximum number of CPU cores that are available for your RDS instance.

The default MAXDOP is 2, which is a medium value that balances the query speed and the CPU overhead. You can call the sp_rds_configure stored procedure to reconfigure the MAXDOP. After you reconfigure the MAXDOP, the new setting immediately takes effect without the need to restart your RDS instance.