Diagnostic reports help you evaluate the operational conditions of an ApsaraDB for Redis instance, such as performance level, skewed requests distribution, and slow queries. Diagnostic reports also help you identify anomalies on the instance.
Components of a diagnostic report
- Basic instance information: displays the basic information about an instance such as the instance ID, specification, type, and the zone in which the instance is deployed.
- Summary: displays the score of the health status and describes the reasons why points are deducted.
- Performance level: displays the statistics and status of key performance metrics related to the instance.
- TOP 10 nodes that receive the greatest number of slow queries: displays the top 10 data nodes that receive the greatest number of slow queries and provides information about the slow queries.
Basic instance information
This section displays the instance ID, specifications, type, and the zone in which the instance is deployed.
This section displays the diagnostic results and the score of the instance health status. The highest score is 100. If your instance fails to achieve a score of 100, you can check the diagnostic items and details.
Displays the statistics and status of key performance metrics related to the instance. You must pay attention to performance metrics that are in the Hazard state.
|Performance metric||Threshold||Impact||Possible cause and troubleshooting method|
|cpuUsage||60%||When an ApsaraDB for Redis instance causes high CPU utilization, the throughput of the instance and the response time to clients are affected. In some cases, the clients may fail to respond.||
For more information about how to troubleshoot these issues, see Troubleshoot high CPU utilization on an ApsaraDB for Redis instance.
|memoryUsage||80%||When the memory usage of an ApsaraDB for Redis instance continuously increases, response time increases, queries per second (QPS) becomes unstable, and keys may be frequently evicted. This affects your business.||Possible causes:
For more information about how to troubleshoot these issues, see Troubleshoot the high memory usage of an ApsaraDB for Redis instance.
|connectionUsage of data nodes||80%||When the number of connections to a data node reaches the upper limit, new connection
requests time out or fail.
For more information about how to troubleshoot these issues, see Instance sessions.
|inFlow||80%||When the inbound or outbound traffic exceeds the maximum bandwidth of the instance specifications, the performance of applications is affected.||
For more information about how to troubleshoot these issues, see Troubleshoot high data usage on an ApsaraDB for Redis instance.
If your instance runs in the cluster architecture or read/write splitting architecture, the system analyzes the preceding performance metrics. The system measures the overall access performance of the instance and displays the result in the diagnostic report. The following table describes the criteria used to determine skewed requests, possible causes, and troubleshooting methods.
|Criteria||Possible cause||Troubleshooting method|
The following conditions are met:
TOP 10 nodes that receive the greatest number of slow queries
This section displays the top 10 data nodes that receive the greatest number of slow queries and information about the corresponding slow queries. The statistics include:
- The slow log data of data nodes is stored in the system audit log. The slow log data is retained for only four days.
- The slow log data that is stored on the data node. Only the most recent 1,024 log entries are retained. You can use redis-cli to connect to the instance and run the SLOWLOG GET command to view the slow log.
You can analyze the slow queries and determine whether improper commands exist. This way, you can find the solutions to different issues.
Commands that implement a time complexity of O(N) or consume more resources, such as keys *.
|Evaluate and disable commands that cause high risk and consume a large amount of resources, such as FLUSHALL, KEYS, and HGETALL. For more information, see Disable high-risk commands.|
|Big keys that are frequently read from and written to the data nodes.||Analyze and evaluate the big keys. For more information, see Use the cache analytics feature to find big keys. Then, split the big keys based on your business requirements.|