This page lists all monitoring metrics for AnalyticDB for MySQL, organized by category and edition.
Cluster health status
Health status metrics report the number of nodes in each state (Healthy, At-risk, or Unavailable) for each node type in your cluster.
Enterprise Edition and Basic Edition
| Metric | Description | References |
|---|---|---|
| Cluster access node status | Reports the number of available and unavailable access nodes. The access layer handles protocol-layer access, SQL parsing and optimization, real-time write sharding, data scheduling, and query scheduling. | View monitoring information (console) · DescribeDBClusterHealthStatus (API) |
| Elastic compute node health status | Reports the number of available and unavailable elastic compute nodes. Elastic compute nodes are temporary resources scaled out for scheduled or on-demand scaling, within seconds or minutes. | |
| Reserved resource node health status | Reports the number of available, at-risk, and unavailable reserved resource nodes. Reserved resource nodes are pre-purchased resources using a storage-compute coupled architecture. |
Data Lakehouse Edition and Data Warehouse Edition
| Metric | Description | References |
|---|---|---|
| Cluster access node status | Reports the number of available and unavailable instance access nodes. The access layer handles protocol-layer access, SQL parsing and optimization, real-time write sharding, data scheduling, and query scheduling. | View monitoring information (console) · API: Data Lakehouse Edition · Data Warehouse Edition |
| Compute node health status | Reports the number of available and unavailable compute nodes. The compute engine uses distributed Massively Parallel Processing (MPP) and directed acyclic graph (DAG) architectures with elastic scheduling. | |
| Data node health status | Reports the number of available, at-risk, and unavailable data nodes. The storage engine is a distributed, high-availability (HA) engine based on the Raft protocol, with data sharding, Multi-Raft, tiered storage, and hybrid row-column storage. |
Cluster performance monitoring
Node monitoring
Enterprise Edition and Basic Edition
| Metric | Metric key | Metric value name | Unit | References |
|---|---|---|---|---|
| CPU utilization | AnalyticDB_CPU | worker_max_cpu_used — Max CPU utilization of reserved resource nodes | % | View monitoring information (console) · DescribeDBClusterPerformance (API) |
worker_p95_cpu_used — P95 CPU utilization of reserved resource nodes | ||||
worker_avg_cpu_used — Average CPU utilization of reserved resource nodes | ||||
executor_max_cpu_used — Max CPU utilization of elastic compute nodes | ||||
executor_p95_cpu_used — P95 CPU utilization of elastic compute nodes | ||||
executor_avg_cpu_used — Average CPU utilization of elastic compute nodes | ||||
| BUILD jobs | AnalyticDB_BuildTaskCount | avg_build_task_count — Average number of BUILD jobs across all reserved resource nodes | count | |
max_build_task_count — Maximum number of BUILD jobs across all reserved resource nodes | ||||
Compute memory usage | AnalyticDB_ComputeMemoryUsedRatio | The maximum compute memory usage of reserved resource nodes. | max_worker_compute_memory_used_ratio | % |
The P95 compute memory usage of reserved resource nodes. | p95_worker_compute_memory_used_ratio | |||
The average compute memory usage of reserved resource nodes. | avg_worker_compute_memory_used_ratio | |||
The maximum compute memory usage of elastic compute nodes. | max_executor_compute_memory_used_ratio | |||
The P95 compute memory usage of elastic compute nodes. | p95_executor_compute_memory_used_ratio | |||
The average compute memory usage of elastic compute nodes. | avg_executor_compute_memory_used_ratio | |||
| Unavailable nodes | AnalyticDB_UnavailableNodeCount | worker_unavailable_node_count — Unavailable reserved resource nodes | Item | |
executor_unavailable_node_count — Unavailable elastic compute nodes | ||||
| Amount of read table data | AnalyticDB_Table_Read_Result_Size | table_max_read_result_size — Max amount of read table data | MB | |
table_avg_read_result_size — Average amount of read table data | ||||
| CPU utilization of access nodes | AnalyticDB_RC_CPU | rc_max_cpu_used — Max CPU utilization of access nodes | % | |
rc_p95_cpu_used — P95 CPU utilization of access nodes | ||||
rc_controller_avg_cpu_used — Average CPU utilization of access nodes | ||||
| Disk I/O throughput | AnalyticDB_IO | worker_max_read_bytes_ratio — Max disk read throughput of reserved resource nodes | MB/s | |
worker_p95_read_bytes_ratio — P95 disk read throughput of reserved resource nodes | ||||
worker_avg_read_bytes_ratio — Average disk read throughput of reserved resource nodes | ||||
worker_max_write_bytes_ratio — Max disk write throughput of reserved resource nodes | ||||
worker_p95_write_bytes_ratio — P95 disk write throughput of reserved resource nodes | ||||
worker_avg_write_bytes_ratio — Average disk write throughput of reserved resource nodes | ||||
| Disk IOPS | AnalyticDB_IOPS | worker_max_read_ratio — Max disk read operations on reserved resource nodes | io/s | |
worker_p95_read_ratio — P95 disk read operations on reserved resource nodes | ||||
worker_avg_read_ratio — Average disk read operations on reserved resource nodes | ||||
worker_max_write_ratio — Max disk write operations on reserved resource nodes | ||||
worker_p95_write_ratio — P95 disk write operations on reserved resource nodes | ||||
worker_avg_write_ratio — Average disk write operations on reserved resource nodes | ||||
| Disk I/O usage | AnalyticDB_IO_UTIL | worker_max_io_util — Max disk I/O usage of reserved resource nodes | % | |
worker_p95_io_util — P95 disk I/O usage of reserved resource nodes | ||||
worker_avg_io_util — Average disk I/O usage of reserved resource nodes | ||||
| Disk I/O wait time | AnalyticDB_IO_WAIT | worker_max_io_await — Max disk I/O wait time of reserved resource nodes | ms | |
worker_p95_io_await — P95 disk I/O wait time of reserved resource nodes | ||||
worker_avg_io_await — Average disk I/O wait time of reserved resource nodes | ||||
| Memory usage of access nodes | AnalyticDB_RC_MemoryUsedRatio | rc_max_memory_used_ratio — Max memory usage of access nodes | % | |
rc_p95_memory_used_ratio — P95 memory usage of access nodes | ||||
rc_avg_memory_used_ratio — Average memory usage of access nodes | ||||
| Disk I/O throughput of access nodes | AnalyticDB_RC_IO | rc_max_read_mebibytes — Max read throughput of access nodes | MB/s | |
rc_p95_read_mebibytes — P95 read throughput of access nodes | ||||
rc_avg_read_mebibytes — Average read throughput of access nodes | ||||
rc_max_write_mebibytes — Max write throughput of access nodes | ||||
rc_p95_write_mebibytes — P95 write throughput of access nodes | ||||
rc_avg_write_mebibytes — Average write throughput of access nodes | ||||
| Disk IOPS of access nodes | AnalyticDB_RC_IOPS | rc_max_read_iops — Max read operations on access nodes | io/s | |
rc_p95_read_iops — P95 read operations on access nodes | ||||
rc_avg_read_iops — Average read operations on access nodes | ||||
rc_max_write_iops — Max write operations on access nodes | ||||
rc_p95_write_iops — P95 write operations on access nodes | ||||
rc_avg_write_iops — Average write operations on access nodes |
Data Lakehouse Edition and Data Warehouse Edition
After switching a Data Warehouse Edition cluster from reserved mode (C32) to elastic mode, average CPU utilization increases. For details, see FAQ.
| Metric | Metric key | Metric value name | Unit | References |
|---|---|---|---|---|
| CPU utilization | AnalyticDB_CPU | executor_max_cpu_used — Max CPU utilization of compute nodes | % | View monitoring information (console) · API: Data Warehouse Edition · Data Lakehouse Edition |
executor_p95_cpu_used — P95 CPU utilization of compute nodes | ||||
executor_avg_cpu_used — Average CPU utilization of compute nodes | ||||
worker_max_cpu_used — Max CPU utilization of data nodes | ||||
worker_p95_cpu_used — P95 CPU utilization of data nodes | ||||
worker_avg_cpu_used — Average CPU utilization of data nodes | ||||
| BUILD jobs | AnalyticDB_BuildTaskCount | avg_build_task_count — Average number of BUILD jobs across all reserved resource nodes | Item | |
max_build_task_count — Maximum number of BUILD jobs across all reserved resource nodes | ||||
Compute memory usage | AnalyticDB_ComputeMemoryUsedRatio | The maximum compute memory usage. | max_executor_compute_memory_used_ratio | % |
The P95 compute memory usage. | p95_executor_compute_memory_used_ratio | |||
The average compute memory usage. | avg_executor_compute_memory_used_ratio | |||
| Unavailable nodes | AnalyticDB_UnavailableNodeCount | worker_unavailable_node_count — Unavailable data nodes | count | |
executor_unavailable_node_count — Unavailable compute nodes | ||||
| Amount of read table data | AnalyticDB_Table_Read_Result_Size | table_max_read_result_size — Max amount of read table data | MB | |
table_avg_read_result_size — Average amount of read table data | ||||
| CPU utilization of access nodes | AnalyticDB_RC_CPU | rc_max_cpu_used — Max CPU utilization of access nodes | % | |
rc_p95_cpu_used — P95 CPU utilization of access nodes | ||||
rc_controller_avg_cpu_used — Average CPU utilization of access nodes | ||||
| Disk I/O throughput | AnalyticDB_IO | worker_max_read_bytes_ratio — Max disk read throughput of data nodes | MB/s | |
worker_p95_read_bytes_ratio — P95 disk read throughput of data nodes | ||||
worker_avg_read_bytes_ratio — Average disk read throughput of data nodes | ||||
worker_max_write_bytes_ratio — Max disk write throughput of data nodes | ||||
worker_p95_write_bytes_ratio — P95 disk write throughput of data nodes | ||||
worker_avg_write_bytes_ratio — Average disk write throughput of data nodes | ||||
| Disk IOPS | AnalyticDB_IOPS | worker_max_read_ratio — Max disk read operations on data nodes | io/s | |
worker_p95_read_ratio — P95 disk read operations on data nodes | ||||
worker_avg_read_ratio — Average disk read operations on data nodes | ||||
worker_max_write_ratio — Max disk write operations on data nodes | ||||
worker_p95_write_ratio — P95 disk write operations on data nodes | ||||
worker_avg_write_ratio — Average disk write operations on data nodes | ||||
| Disk I/O usage | AnalyticDB_IO_UTIL | worker_max_io_util — Max disk I/O usage of data nodes | % | |
worker_p95_io_util — P95 disk I/O usage of data nodes | ||||
worker_avg_io_util — Average disk I/O usage of data nodes | ||||
| Disk I/O wait time | AnalyticDB_IO_WAIT | worker_max_io_await — Max disk I/O wait time of data nodes | ms | |
worker_p95_io_await — P95 disk I/O wait time of data nodes | ||||
worker_avg_io_await — Average disk I/O wait time of data nodes | ||||
| Memory usage of access nodes | AnalyticDB_RC_MemoryUsedRatio | rc_max_memory_used_ratio — Max memory usage of access nodes | % | |
rc_p95_memory_used_ratio — P95 memory usage of access nodes | ||||
rc_avg_memory_used_ratio — Average memory usage of access nodes | ||||
| Disk I/O throughput of access nodes | AnalyticDB_RC_IO | rc_max_read_mebibytes — Max read throughput of access nodes | MB/s | |
rc_p95_read_mebibytes — P95 read throughput of access nodes | ||||
rc_avg_read_mebibytes — Average read throughput of access nodes | ||||
rc_max_write_mebibytes — Max write throughput of access nodes | ||||
rc_p95_write_mebibytes — P95 write throughput of access nodes | ||||
rc_avg_write_mebibytes — Average write throughput of access nodes | ||||
| Disk IOPS of access nodes | AnalyticDB_RC_IOPS | rc_max_read_iops — Max read operations on access nodes | io/s | |
rc_p95_read_iops — P95 read operations on access nodes | ||||
rc_avg_read_iops — Average read operations on access nodes | ||||
rc_max_write_iops — Max write operations on access nodes | ||||
rc_p95_write_iops — P95 write operations on access nodes | ||||
rc_avg_write_iops — Average write operations on access nodes |
Data size monitoring
Enterprise Edition and Basic Edition
| Metric | Metric key | Metric value name | Unit | References |
|---|---|---|---|---|
| Disk usage | AnalyticDB_DiskUsedRatio | disk_used_ratio — Average disk usage | % | View monitoring information (console) · DescribeDBClusterPerformance (API) |
worker_max_node_disk_used_ratio — Max disk usage | ||||
| Disk space used | AnalyticDB_DiskUsedSize | cold_disk_used — Size of cold data | Byte | |
hot_disk_used — Size of hot data | ||||
user_used_disk_max — Max hot data size per node | ||||
user_used_disk_avg — Average hot data size per node |
Data Lakehouse Edition and Data Warehouse Edition
| Metric | Metric key | Metric value name | Unit | References |
|---|---|---|---|---|
| Disk usage | AnalyticDB_DiskUsedRatio | disk_used_ratio — Average disk usage | % | View monitoring information (console) · API: Data Lakehouse Edition · Data Warehouse Edition |
worker_max_node_disk_used_ratio — Max disk usage | ||||
| Disk space used | AnalyticDB_DiskUsedSize | cold_disk_used — Size of cold data | Byte | |
hot_disk_used — Size of hot data | ||||
user_used_disk_max — Max hot data size per node | ||||
user_used_disk_avg — Average hot data size per node |
Workload monitoring
Enterprise Edition and Basic Edition
| Metric | Metric key | Metric value name | Unit | References |
|---|---|---|---|---|
| Cluster connections | AnalyticDB_Connections | connections — Successful connections | count | View monitoring information (console) · DescribeDBClusterPerformance (API) |
| Query failure rate¹ | AnalyticDB_QueryFailedRatio | query_failed_ratio — Query failure rate | % | |
| Query QPS | AnalyticDB_QPS | qps — Queries per second | op/s | |
etl_qps — Extract, transform, and load (ETL) QPS | ||||
| Query response time | AnalyticDB_QueryRT | query_avg_rt — Average query response time | ms | |
query_max_rt — Max query response time | ||||
| Query wait time | AnalyticDB_QueryWaitTime | query_avg_wait_time — Average query wait time | ms | |
query_max_wait_time — Max query wait time | ||||
| Write TPS | AnalyticDB_InsertTPS | insert_tps — Write transactions per second | op/s | |
| Write response time | AnalyticDB_InsertRT | insert_avg_rt — Average write response time | ms | |
insert_max_rt — Max write response time | ||||
| Write throughput | AnalyticDB_InsertBytes | insert_in_bytes — Average write throughput | MB | |
| Update TPS | AnalyticDB_UpdateTPS | update_tps — Update TPS | op/s | |
| Update response time | AnalyticDB_UpdateRT | updateinto_avg_rt — Average update response time | ms | |
updateinto_max_rt — Max update response time | ||||
| Delete TPS | AnalyticDB_DeleteTPS | delete_tps — Delete TPS | op/s | |
| Delete response time | AnalyticDB_DeleteRT | delete_avg_rt — Average delete response time | ms | |
delete_max_rt — Max delete response time | ||||
| Import TPS | AnalyticDB_LoadTPS | load_tps — Load TPS | op/s |
Data Lakehouse Edition and Data Warehouse Edition
| Metric | Metric key | Metric value name | Unit | References |
|---|---|---|---|---|
| Cluster connections | AnalyticDB_Connections | connections — Successful connections | count | View monitoring information (console) · API: Data Lakehouse Edition · Data Warehouse Edition |
| Query failure rate¹ | AnalyticDB_QueryFailedRatio | query_failed_ratio — Query failure rate | % | |
| Query QPS | AnalyticDB_QPS | qps — Queries per second | op/s | |
etl_qps — ETL QPS | ||||
| Query response time | AnalyticDB_QueryRT | query_avg_rt — Average query response time | ms | |
query_max_rt — Max query response time | ||||
| Query wait time | AnalyticDB_QueryWaitTime | query_avg_wait_time — Average query wait time | ms | |
query_max_wait_time — Max query wait time | ||||
| Write TPS | AnalyticDB_InsertTPS | insert_tps — Write TPS | op/s | |
| Write response time | AnalyticDB_InsertRT | insert_avg_rt — Average write response time | ms | |
insert_max_rt — Max write response time | ||||
| Write throughput | AnalyticDB_InsertBytes | insert_in_bytes — Average write throughput | MB | |
| Update TPS | AnalyticDB_UpdateTPS | update_tps — Update TPS | op/s | |
| Update response time | AnalyticDB_UpdateRT | updateinto_avg_rt — Average update response time | ms | |
updateinto_max_rt — Max update response time | ||||
| Delete TPS | AnalyticDB_DeleteTPS | delete_tps — Delete TPS | op/s | |
| Delete response time | AnalyticDB_DeleteRT | delete_avg_rt — Average delete response time | ms | |
delete_max_rt — Max delete response time | ||||
| Import TPS | AnalyticDB_LoadTPS | load_tps — Load TPS | op/s |
¹ Query failure rate is calculated as follows:
Time range within 24 hours: Query failure rate = (Failed SQL queries in 1 minute / Total SQL queries in 1 minute) × 100%Time range exceeding 24 hours: Query failure rate = (Failed SQL queries in 5 minutes / Total SQL queries in 5 minutes) × 100%Resource group monitoring
Enterprise Edition, Basic Edition, and Data Lakehouse Edition
| Metric | Metric key | Metric value name | Unit | References |
|---|---|---|---|---|
| CPU utilization | AnalyticDB_RP_CPU | AnalyticDB_RP_CPU — Average CPU utilization of the resource group | % | View monitoring information (console) · DescribeDBClusterPerformance (API) |
| Query QPS | AnalyticDB_RP_QPS | AnalyticDB_RP_QPS — Query QPS of the resource group | op/s | |
| Query response time | AnalyticDB_RP_RT | AnalyticDB_RP_RT — Average response time of queries in the resource group | ms | |
| Query wait time | AnalyticDB_RP_WaitTime | AnalyticDB_RP_WaitTime — Total average wait time of queries in the resource group | ms | |
| (Xihe) Running SQL queries | AnalyticDB_RP_RunningQueries_Count | AnalyticDB_RP_RunningQueries_Count — Running SQL queries in the resource group | Unit | |
| Queued SQL queries | AnalyticDB_RP_QueuedQueries_Count | AnalyticDB_RP_QueuedQueries_Count — Queued SQL queries in the resource group | count | |
| Computing resource usage² | None | TotalAcuNumber — Total computing resources | ACU | View the computing and storage resource usage of a cluster (console) · DescribeClusterResourceUsage (API) |
ReservedAcuNumber — Reserved computing resources | ||||
| Storage resource usage² | None | TotalAcuNumber — Total storage resources | ACU | View the computing and storage resource usage of a cluster (console) · DescribeStorageResourceUsage (API) |
ReservedAcuNumber — Reserved storage resources | ||||
| Resource usage | None | TotalAcuNumber — Total computing resources | ACU | View the computing and storage resource usage of a cluster (console) · DescribeStorageResourceUsage (API) |
ReservedAcuNumber — Reserved resources | ||||
| Interactive resource group | None | ReservedAcuNumber — Min computing resources | ACU | View the computing resource usage of a resource group (console) |
MaxAcuNumber — Max computing resources | ||||
CurrentAcuNumber — Current computing resource usage | ||||
| Job resource group | None | ReservedAcuNumber — Min computing resources | ACU | View the computing resource usage of a resource group (console) · DescribeJobResourceUsage (API) |
MaxAcuNumber — Max computing resources | ||||
CurrentAcuNumber — Current computing resource usage | ||||
SpotAcuNumber — Spot instance resource usage | ||||
| Total ACU-hours used by a job | None | TotalAcuNumber — Average ACU-hours used by a job | ACU | View the computing resource usage of a job (console) |
| Reserved ACU-hours | None | ReservedAcuNumber — Reserved ACU-hours out of total job ACU-hours | ACU | |
| Elastic ACU-hours | None | ElasticAcuNumber — Elastic ACU-hours out of total job ACU-hours | ACU |
² Computing resource usage and storage resource usage metrics are supported only by Data Lakehouse Edition.
Data Warehouse Edition
| Metric | Metric key | Metric value name | Unit | References |
|---|---|---|---|---|
| CPU utilization | AnalyticDB_RP_CPU | AnalyticDB_RP_CPU — Average CPU utilization of the resource group | % | View monitoring information (console) · DescribeDBClusterPerformance (API) |
| Query QPS | AnalyticDB_RP_QPS | AnalyticDB_RP_QPS — Query QPS of the resource group | op/s | |
| Query response time | AnalyticDB_RP_RT | AnalyticDB_RP_RT — Average response time of queries in the resource group | ms | |
| Query wait time | AnalyticDB_RP_WaitTime | AnalyticDB_RP_WaitTime — Total average wait time of queries in the resource group | ms | |
| Actual Pop-ups | AnalyticDB_RP_ActualNode | AnalyticDB_RP_ActualNode — Nodes actually added when a scale-out plan executes | Item | |
| Number of Planned PoPs | AnalyticDB_RP_PlanNode | AnalyticDB_RP_PlanNode — Nodes planned to be added based on a scheduled scaling plan | count | |
| Total nodes | AnalyticDB_RP_TotalNode | AnalyticDB_RP_TotalNode — Total nodes in the resource group (basic nodes + actual scaled-out nodes from scheduled scaling) | Unit | |
| Basic nodes | AnalyticDB_RP_OriginalNode | AnalyticDB_RP_OriginalNode — Basic nodes in the resource group | Item |
Spark monitoring
Spark monitoring metrics are not available in the AnalyticDB for MySQL console. To view them, go to the CloudMonitor console.
| Metric | Description | MetricName | Unit | References |
|---|---|---|---|---|
| Spark CPU utilization (%) | CPU utilization of Spark | SparkCpuUtilizationEci, SparkCpuUtilizationShenlong | % | View Spark monitoring information (console) · DescribeMetricList (API) |
| Spark memory utilization (%) | Memory usage of Spark | SparkMemoryUtilizationEci, SparkMemoryUtilizationShenlong | % | |
| Peak on-heap execution memory usage (B) | Max JVM heap memory used while a Spark job runs | SparkExecutorOnHeapExecutionMemoryBytes | Byte | |
| Peak off-heap execution memory usage (B) | Max memory used outside the JVM heap while a Spark job runs | SparkExecutorOffHeapExecutionMemoryBytes | Byte | |
| Peak on-heap storage memory usage (B) | Max JVM heap memory used to store Spark data such as cached Resilient Distributed Datasets (RDDs) | SparkExecutorOnHeapStorageMemoryBytes | Byte | |
| Peak off-heap storage memory usage (B) | Max JVM off-heap memory used to store Spark data such as cached RDDs | SparkExecutorOffHeapStorageMemoryBytes | Byte | |
| RDD storage disk usage (B) | Disk space used by RDDs in Spark | SparkExecutorDiskUsedBytes | Byte | |
| Major GC count (count) | Number of major garbage collections (GCs) performed by the JVM while a Spark job runs | SparkExecutorMajorGCCount | count | |
| Minor GC count (count) | Number of minor GCs performed by the JVM while a Spark job runs | SparkExecutorMinorGCCount | Unit | |
| Spark GC time (s) | Total time consumed by Spark garbage collection | SparkExecutorTotalGCTimeSeconds | s | |
| Spark shuffle read data size (B) | Size of data read during a Spark shuffle | SparkExecutorTotalShuffleReadBytes | Byte | |
| Spark shuffle write data size (B) | Size of data written during a Spark shuffle | SparkExecutorTotalShuffleWriteBytes | Byte |
References
Optimize cluster performance based on monitoring information — describes performance-related metrics, explains how to diagnose abnormal metric values, and provides troubleshooting and optimization guidance.