AnalyticDB for MySQL Data Warehouse Edition (V3.0) clusters in elastic mode for Cluster Edition and Data Lakehouse Edition (V3.0) clusters provide various metrics, including data queries and writes, resource group information, and cluster running status. You can view cluster metrics within a time range in the last month in the AnalyticDB for MySQL console or by calling API operations. This helps you identify and resolve issues based on the performance and running status of a cluster.
Usage notes
You can view the monitoring information within two days in the last month.
View monitoring information about a Data Warehouse Edition (V3.0) cluster
Procedure
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Data Warehouse Edition (V3.0) tab, find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, click Monitoring Information.
On the Monitoring Information page, click the Cluster Resource Monitoring or Resource Group Monitoring tab to view the corresponding monitoring information.
Metrics
Health status metrics
ImportantYou can view the health status information only for AnalyticDB for MySQL clusters of V3.1.6 and later.
If the value of a health status metric is Risky or Unavailable, contact technical support.
Metric
Description
Cluster Access Node Status
The access layer of AnalyticDB for MySQL is composed of multiple cluster access nodes and provides features such as protocol layer access, SQL parsing and optimization, real-time sharding of written data, data scheduling, and query scheduling.
Valid values:
Healthy: the number of available cluster access nodes.
Unavailable: the number of unavailable cluster access nodes.
Health Status of Compute Node Groups
The compute engine of AnalyticDB for MySQL is composed of compute node groups and supports the integrated execution of distributed massively parallel processing (MPP) and directed acyclic graph (DAG) architectures. The compute engine can work with intelligent optimizers to support high concurrency and hybrid loads of complex SQL statements. Additionally, the cloud native infrastructure allows compute nodes to be elastically scaled out within seconds based on business requirements. This way, resources can be used in an efficient manner.
Valid values:
Healthy: the number of available compute nodes.
Unavailable: the number of unavailable compute nodes.
Health Status of Storage Node Groups
The storage engine of AnalyticDB for MySQL is composed of storage nodes and supports real-time data writes with strong consistency and high availability in compliance with the Raft consensus protocol. The storage engine uses data sharding and Multi-Raft to support parallel processing, tiered storage to separate hot and cold data at lower costs, and hybrid row-column storage and intelligent indexing to provide ultra-high performance.
Valid values:
Healthy: the number of available storage nodes.
Risky: the number of at-risk storage nodes.
Unavailable: the number of unavailable storage nodes.
Cluster Resource Monitoring metrics
Metric
Unit
Description
CPU Utilization
%
Displays the following monitoring information:
Maximum CPU Utilization of Storage Node
Average CPU Utilization of Storage Node
Maximum CPU Utilization of Compute Node
CPU Utilization of Compute Node
NoteAfter you change a C32 cluster from reserved mode to elastic mode, the average CPU utilization increases. For more information, see the "FAQ" section of this topic.
Disk I/O Throughput
MB
Displays the following monitoring information:
Disk Read Throughput of Storage Node
Disk Write Throughput of Storage Node
BUILD Jobs
N/A
Displays the following monitoring information:
Average BUILD Jobs: the average number of BUILD jobs that run on storage nodes.
Maximum BUILD Jobs: the maximum number of BUILD jobs that run on a single storage node.
Disk IOPS
N/A
Displays the following monitoring information:
Average Reads per Second from Storage Node
Average Write per Second to Storage Node
Disk I/O Usage
%
Displays the disk I/O usage of storage nodes.
Disk I/O Wait Time
ms
Displays the disk I/O wait time of storage nodes.
Cluster Connections
N/A
Displays the number of successful connections.
Disk Space Used
MB
Displays the maximum disk space used by a cluster.
Hot Data Space Used
MB
Displays the amount of hot data used within a cluster.
Cold Data Space Used
MB
Displays the amount of cold data used within a cluster.
Unavailable Nodes
N/A
Displays the following monitoring information:
Unavailable Compute Nodes
Unavailable Storage Nodes
Compute Memory Usage
%
Displays the following monitoring information:
Maximum Compute Memory Usage of Storage Nodes
Average Compute Memory Usage of Storage Nodes
Average Compute Memory Usage of Compute Nodes
Maximum Compute Memory Usage of Compute Nodes
Queries
QPS
N/A
Displays the queries per second (QPS).
Query Response Time
ms
Displays the following monitoring information:
Average Query Response Time
Maximum Query Response Time
Query Wait Time
ms
Displays the following monitoring information:
Average Wait Time for Query
Maximum Wait Time for Query
Query Failure Rate
%
The failure rate of queries.
If you select a time range within 24 hours, the query failure rate per minute is displayed, which is calculated by using the following formula:
Query failure rate = (Number of failed SQL queries in 1 minute/Total number of SQL queries in 1 minute) × 100%
.If you select a time range that exceeds 24 hours, the query failure rate for every 5 minutes is displayed, which is calculated by using the following formula:
Query failure rate = (Number of failed SQL queries within 5 minutes/Total number of SQL queries within 5 minutes) × 100%
.
Writes
Write Response Time
ms
Displays the following monitoring information:
Average Write Response Time
Maximum Write Response Time
Delete Response Time
ms
Displays the following monitoring information:
Average Deletion Response Time
Maximum Deletion Response Time
Update Response Time
ms
Displays the following monitoring information:
Average Update Response Time
Maximum Update Response Time
Write Throughput
MB
Displays the average write throughput of a cluster.
TPS
N/A
Displays the following monitoring information:
Total transactions per second (TPS), including the write TPS, delete TPS, update TPS, and load TPS
Write TPS
Delete TPS
Update TPS
Load TPS
Resource Group Monitoring metrics
ImportantYou can view the Resource Group Monitoring information only for Data Warehouse Edition (V3.0) clusters that meet the following requirements:
The cluster is in elastic mode for Cluster Edition.
The cluster has 32 cores or more.
Metric
Unit
Description
Average CPU Utilization
%
Displays the CPU utilization of each resource group.
Query Response Time
ms
Displays the average response time of queries processed by each resource group.
QPS
N/A
Displays the queries processed by each resource group per second.
Query Wait Time
ms
Displays the average wait time of queries processed by each resource group.
Scheduled Nodes Actually Scaled Out in Resource Group
N/A
Displays the number of nodes added to each resource group in a scheduled scaling plan.
Scheduled Nodes to Be Scaled Out in Resource Group
N/A
Displays the number of nodes that need to be added to each resource group in a scheduled scaling plan.
For information about how to create a scaling plan for a resource group, see Create a resource scaling plan.
Total Nodes in Resource Group
N/A
Displays the total number of nodes in a resource group. The total number of nodes in a resource group is calculated by using the following formula: Total number of nodes = Number of basic nodes + Number of effective nodes in scheduled scaling plans.
Basic Nodes in Resource Group
N/A
Displays the number of basic nodes in a resource group.
View monitoring information about a Data Lakehouse Edition (V3.0) cluster
Procedure
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Data Lakehouse Edition (V3.0) tab, find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, choose
.On the Monitoring Information page, click the Cluster or Resource Group tab to view the corresponding monitoring information.
Metrics
The cluster metrics and resource group metrics are displayed for an AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster.
Cluster metrics
ImportantIf the value of a health status metric is Risky or Unavailable, contact technical support.
Metric
Unit
Description
Cluster Monitoring
Cluster Access Node Status
N/A
The access layer of AnalyticDB for MySQL is composed of multiple cluster access nodes and provides features such as protocol layer access, SQL parsing and optimization, real-time sharding of written data, data scheduling, and query scheduling.
Valid values:
Healthy: the number of available cluster access nodes.
Unavailable: the number of unavailable cluster access nodes.
Health Status of Compute Node Groups
N/A
The compute engine of AnalyticDB for MySQL is composed of compute node groups and supports the integrated execution of distributed massively parallel processing (MPP) and directed acyclic graph (DAG) architectures. The compute engine can work with intelligent optimizers to support high concurrency and hybrid loads of complex SQL statements. Additionally, the cloud native infrastructure allows compute nodes to be elastically scaled out within seconds based on business requirements. This way, resources can be used in an efficient manner.
Valid values:
Healthy: the number of available compute nodes.
Unavailable: the number of unavailable compute nodes.
Health Status of Storage Node Groups
N/A
The storage engine of AnalyticDB for MySQL is composed of storage nodes and supports real-time data writes with strong consistency and high availability in compliance with the Raft consensus protocol. The storage engine uses data sharding and Multi-Raft to support parallel processing, tiered storage to separate hot and cold data at lower costs, and hybrid row-column storage and intelligent indexing to provide ultra-high performance.
Valid values:
Healthy: the number of available storage nodes.
Risky: the number of at-risk storage nodes.
Unavailable: the number of unavailable storage nodes.
Performance and Load
CPU Utilization
%
Displays the following monitoring information:
Maximum CPU Utilization of Storage Node
Average CPU Utilization of Storage Nodes
Maximum CPU Utilization at Access Layer
Average CPU Utilization at Access Layer
Maximum CPU Utilization of Compute Node
Average CPU Utilization of Compute Nodes
Cluster Connections
N/A
Displays the number of successful connections.
BUILD Jobs
N/A
Displays the following monitoring information:
Average BUILD Jobs: the average number of BUILD jobs that run on storage nodes.
Maximum BUILD Jobs: the maximum number of BUILD jobs that run on a single storage node.
Write Response Time
ms
Displays the following monitoring information:
Maximum Write Response Time
Average Write Response Time
Query Response Time
ms
Displays the following monitoring information:
Maximum Query Response Time
Average Query Response Time
Query Failure Rate
%
The failure rate of queries.
If you select a time range within 24 hours, the query failure rate per minute is displayed, which is calculated by using the following formula:
Query failure rate = (Number of failed SQL queries in 1 minute/Total number of SQL queries in 1 minute) × 100%
.If you select a time range that exceeds 24 hours, the query failure rate for every 5 minutes is displayed, which is calculated by using the following formula:
Query failure rate = (Number of failed SQL queries within 5 minutes/Total number of SQL queries within 5 minutes) × 100%
.
Disk I/O Throughput
MB
Displays the following monitoring information:
Write Throughput of Compute Node
Read Throughput of Compute Node
Write Throughput of Storage Node
Read Throughput of Storage Node
Disk IOPS
N/A
Displays the following monitoring information:
Disk Write IOPS of Compute Node
Disk Read IOPS of Compute Node
Disk Write IOPS of Storage Node
Disk Read IOPS of Storage Node
Disk I/O Usage of Storage Node
%
Displays the average disk I/O usage.
Disk I/O Wait Time of Storage Node
ms
Displays the average disk I/O wait time.
Total Disk Space Used
MB
Displays the following monitoring information:
Disk Space Used on Compute Node
Disk Space Used on Storage Node
Cold Data Space Used
MB
Displays the amount of cold data used within a cluster.
Hot Data Space Used
MB
Displays the amount of hot data used within a cluster.
Unavailable Nodes
N/A
Displays the following monitoring information:
Unavailable Compute Nodes
Unavailable Storage Nodes
Compute Memory Usage
%
Displays the following monitoring information:
Average Compute Memory Usage of Compute Nodes
Maximum Compute Memory Usage of Compute Nodes
Average Compute Memory Usage of Storage Nodes
Maximum Compute Memory Usage of Storage Nodes
Resource group metrics
Metric
Unit
Description
CPU Utilization
%
Displays the CPU utilization of each resource group, including the user_default resource group.
For more information about resource group monitoring, see View monitoring information about resource groups.
FAQ
Q: Why does the average CPU utilization increase after I change a cluster from reserved mode to elastic mode?
A: After you change a C32 cluster from reserved mode to elastic mode, the specifications of a single node decrease to 8 cores. By default, BUILD jobs occupy 3 cores. In this case, the average CPU utilization increases. If the increased average CPU utilization does not affect your business, ignore this change. If your business is affected, upgrade your cluster or submit a ticket. For more information about BUILD jobs, see BUILD.
Q: Why are the values of Regular Index and Primary Key Index metrics large?
A: The preceding metrics may have large values due to the following reasons:
Indexes and primary key indexes are created for a large number of columns.
The length of a value in index columns is large, or the total length of all values in an index column is large. For example, the value of an index column is a long string.
The number of distinct values in index columns is large. This results in a low index compression ratio. For example, index column A has four distinct values: A1, A2, A3, and A4. Data is difficult to be compressed, which results in a low index compression ratio.
The length of a value in the primary key is large or multiple columns comprise a composite primary key.
Q: A large response time is displayed on the Monitoring Information page, but no corresponding time-consuming SQL statements are found on the Diagnostics and Optimization page. Why?
A: A large amount of query result data requires a long period of time to cache the result set. However, the total duration that is displayed on the Diagnostics and Optimization page consists of the queuing time, execution plan duration, and execution duration, excluding the cache duration of the result set. We recommend that you view the corresponding time-consuming SQL statements on the SQL Audit page.
Related operations
Related operations for Data Warehouse Edition
Operation | Description |
Queries the performance data of an AnalyticDB for MySQL Data Warehouse Edition (V3.0) cluster. | |
Queries the monitoring information about resource groups within an AnalyticDB for MySQL Data Warehouse Edition (V3.0) cluster. | |
Queries the health status of an AnalyticDB for MySQL Data Warehouse Edition (V3.0) cluster. |
Related operations for Data Lakehouse Edition
Operation | Description |
Queries the monitoring information about resource groups within an AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster. |