Performance optimization - AnalyticDB - Alibaba Cloud Documentation Center

This topic provides answers to some frequently asked questions about performance optimization in AnalyticDB for MySQL.

Note

If no edition is specified in a question, the question is applicable only to AnalyticDB for MySQL Data Warehouse Edition (V3.0) clusters.

FAQ overview

Why does the CPU utilization remain high even when the peak write throughput decreases?
In which scenarios does AnalyticDB for MySQL provide low query performance?
What do I do if the query memory usage exceeds the limit?
What do I do if the disk usage exceeds the limit during a query?
How do I identify the causes of a query slowdown?
How do I identify which queries consume large amounts of memory and CPU resources?
Why are ANALYZE statements misdiagnosed as slow queries?
Why does a CPU overload occur when I use the statistics feature?
What do I do if the query response time is affected by CPU overload that is caused by statistics collection?
The execution result of SELECT * FROM INFORMATION_SCHEMA.COLUMN_STATISTICS shows that statistics are not updated for several days. Why?
Are statistics automatically updated after data is imported to a new table?

Why does the CPU utilization remain high even when the peak write throughput decreases?

AnalyticDB for MySQL creates indexes for written data in real time to speed up queries. Large amounts of system resources are consumed when indexes are created, especially when the peak write throughput causes a surge in the written data volume.

In which scenarios does AnalyticDB for MySQL provide low query performance?

AnalyticDB for MySQL is designed to implement concurrent processing for large amounts of data based on a distributed architecture. In scenarios where a small number of queries are required, AnalyticDB for MySQL provides low query performance because its distributed architecture consumes overheads. In other scenarios, AnalyticDB for MySQL Basic Edition clusters can use the storage indexes in a more efficient manner to improve the query performance.

What do I do if the query memory usage exceeds the limit?

To ensure the stability of AnalyticDB for MySQL clusters and prevent cluster performance from being affected by slow queries, you must set a limit on the query memory usage. For more information about the analysis of slow queries, see Typical slow queries. Take note of the peak memory usage and the amount of scanned data.

The following table describes the error codes that are returned when the query memory usage exceeds the limit and the corresponding causes and solutions.

Error code	Cause	Solution
CLUSTER_OUT_OF_MEMORY(32001)	Large amounts of memory are consumed throughout the entire cluster. To ensure system stability and prevent impacts on other query connections, the system closes query connections that consume large amounts of memory.	We recommend that you set a limit on the number of concurrent queries. For more information, see Priority queues and concurrency control. Go to the Diagnostics and Optimization page. Check for slow queries that reach the peak memory usage and consume large amounts of scanned data. Analyze the cause of high memory usage.
EXCEEDED_MEMORY_LIMIT(32003)	The memory usage of the current query exceeds the limit.	We recommend that you check the operators in the SQL statement that consume large amounts of memory in the query.
OUT_OF_PHYSICAL_MEMORY_ERROR(33015)	The runtime of the current query exceeds the limit for internal memory pool computation.	Check for queries that reach the peak memory usage and consume large amounts of scanned data during the time range when the error occurs. Analyze the cause of high memory usage.

What do I do if the disk usage exceeds the limit during a query?

AnalyticDB for MySQL clusters in elastic mode for Cluster Edition may query data in batch mode. In these cases, intermediate results of queries are written to disks. If large amounts of data are contained in the intermediate results, the disk usage may exceed the limit.

Error code	Cause	Solution
OUT_OF_SPILL_SPACE(32007)	The disk usage exceeds the limit, and space is insufficient to store data on disks.	When the bulk synchronous parallel (BSP) model is used to query data in batch mode, large amounts of shuffled data and operator states are stored on disks. As a result, disk space becomes insufficient. Take note of the peak memory usage, amount of scanned data, and amount of shuffled data for the queries that are executed in batch mode, and reduce the number of concurrent queries. If a cluster has more than 32 compute nodes, you can change the value of the `batch_hash_partition_count` parameter to a value that is greater than or equal to the number of compute nodes. The default value of this parameter is 32. For example, if a cluster has 64 compute nodes, you can change the value of this parameter to 64 to distribute data across more compute nodes and prevent disk usage overlimit. If this issue persists, contact technical support.
EXCEEDED_SPILL_LIMIT(32006)

How do I identify the causes of a query slowdown?

Queries that are executed by using the same SQL pattern may slow down due to the following reasons:

Query conditions remain unchanged but data is added to tables. As a result, more data needs to be processed.
Query conditions are changed. For example, more partitions need to be scanned or the query range is expanded.
The system encounters high resource consumption. For example, an increase in data writes or slow queries may consume large amounts of system resources.

You can find an SQL pattern and click View Details in the Actions column to view resource consumption statistics for different SQL patterns, such as the number of executions, query duration, execution duration, amount of scanned data, and peak memory usage. For more information, see SQL pattern.

How do I identify which queries consume large amounts of memory and CPU resources?

For information about how to resolve this issue, see Typical slow queries.

Why are ANALYZE statements misdiagnosed as slow queries?

ANALYZE statements that are automatically initiated during the maintenance window are executed with I/O throttling and low CPU priority. The statements may be diagnosed as slow queries because they are executed for extended periods of time. However, this does not affect services. If the CPU load is not high or CPU overload is not closely associated with the maintenance window, you can ignore this issue. If the CPU is continuously overloaded, refer to the "What do I do if the query response time is affected by CPU overload that is caused by statistics collection?" section of this topic to resolve this issue.

Why does a CPU overload occur when I use the statistics feature?

A CPU may become overloaded due to the following reasons:

During the default maintenance window from 04:00 to 05:00, the system performs a full scan on each table to collect column statistics. During this period, the CPU is overloaded.
In most cases, statistics are incrementally collected, which does not consume large amounts of resources. By default, the statistics feature is enabled for AnalyticDB for MySQL Data Warehouse Edition (V3.0) clusters that run V3.1.6 or later. When a cluster is updated from an earlier minor version to V3.1.6 or later, full statistics are collected. This may cause CPU overloads during the first days after the minor version update. The CPU load decreases after the full data scan is complete.

When the CPU is overloaded, check whether the query response time is affected. If the average query response time does not significantly change, the query response time is not affected. The value of the CPU utilization metric may be high during statistics collection, but when queries are executed, resources are preferentially allocated for the execution. This is because ANALYZE statements are executed with I/O throttling and low CPU priority.

What do I do if the query response time is affected by CPU overload that is caused by statistics collection?

You can resolve this issue by using the following solutions:

Change the maintenance window to off-peak hours.

set adb_config O_CBO_MAINTENANCE_WINDOW_DURATION = [04:00-05:00];

If you cannot determine the appropriate off-peak hours, we recommend that you change the I/O limit for system queries to a value that is greater than or equal to 16 MB. The default value is 50 MB.
```
set adb_config CSTORE_IO_LIMIT_SYSTEM_QUERY_BPS = 52428800;
```
Assign statistics collection to a low-priority resource group to isolate loads. For more information, see the "Automatic statistics collection" section of the Statistics topic.
```
set adb_config O_CBO_AUTONOMOUS_STATS_ACCOUNT = [user_name];
```
Increase the expiration ratio for columns to reduce the data to be collected. The default value is 0.1. The value ranges between 0 and 1. We recommend that you do not set the expiration ratio to a value that is greater than 0.5.
```
set adb_config O_CBO_STATS_EXPIRED_RATIO = 0.1;
```

If none of the preceding solutions resolve this issue, execute the set adb_config O_CBO_AUTONOMOUS_STATS_ENABLED=false; statement to disable automatic statistics collection. However, performance may be degraded. If you want to collect statistics in the future, you must manually collect statistics. For more information, see the "Manually collect statistics" section of the Statistics topic.

The execution result of SELECT * FROM INFORMATION_SCHEMA.COLUMN_STATISTICS shows that statistics are not updated for several days. Why?

This issue is caused by the following reasons:

The statistics are not expired.
Statistics expire when the amount of data that is updated, inserted, or replaced reaches the expiration ratio. The default expiration ratio is 0.1 (10%). If only a small amount of your data is changed, you can continue to use your cluster as expected and observe this issue for another week.
A large amount of data is contained in many columns and tables.
By default, only 1 hour per day is required to collect statistics that are not included in incremental updates. If a large number of columns and tables are involved, such as more than 1,000 columns, the system may not be able to complete an update within one day. The update may require one week to complete. In this case, the absence of statistics updates within several days is normal. You can continue to use the cluster as expected and observe this issue.

Are statistics automatically updated after data is imported to a new table?

If data is batch imported by using the INSERT OVERWRITE statement, basic statistics are automatically collected after the data import is complete. If data is imported in real time by using the INSERT INTO or REPLACE INTO statement, statistics are collected during the next maintenance window or an incremental collection task is triggered during the incremental collection period after each BUILD task is complete. We recommend that you manually collect basic statistics after data is imported. For more information, see the "Manually collect statistics" section of the Statistics topic.