Monitor Cluster Health with Key Metrics & Dimensions - AnalyticDB for MySQL

This page lists all monitoring metrics for AnalyticDB for MySQL, organized by category and edition.

Cluster health status

Health status metrics report the number of nodes in each state (Healthy, At-risk, or Unavailable) for each node type in your cluster.

Enterprise Edition and Basic Edition

Metric	Description	References
Cluster access node status	Reports the number of available and unavailable access nodes. The access layer handles protocol-layer access, SQL parsing and optimization, real-time write sharding, data scheduling, and query scheduling.	View monitoring information (console) · DescribeDBClusterHealthStatus (API)
Elastic compute node health status	Reports the number of available and unavailable elastic compute nodes. Elastic compute nodes are temporary resources scaled out for scheduled or on-demand scaling, within seconds or minutes.
Reserved resource node health status	Reports the number of available, at-risk, and unavailable reserved resource nodes. Reserved resource nodes are pre-purchased resources using a storage-compute coupled architecture.

Data Lakehouse Edition and Data Warehouse Edition

Metric	Description	References
Cluster access node status	Reports the number of available and unavailable instance access nodes. The access layer handles protocol-layer access, SQL parsing and optimization, real-time write sharding, data scheduling, and query scheduling.	View monitoring information (console) · API: Data Lakehouse Edition · Data Warehouse Edition
Compute node health status	Reports the number of available and unavailable compute nodes. The compute engine uses distributed Massively Parallel Processing (MPP) and directed acyclic graph (DAG) architectures with elastic scheduling.
Data node health status	Reports the number of available, at-risk, and unavailable data nodes. The storage engine is a distributed, high-availability (HA) engine based on the Raft protocol, with data sharding, Multi-Raft, tiered storage, and hybrid row-column storage.

Cluster performance monitoring

Node monitoring

Enterprise Edition and Basic Edition

Metric	Metric key	Metric value name	Unit	References
CPU utilization	`AnalyticDB_CPU`	`worker_max_cpu_used` — Max CPU utilization of reserved resource nodes	%	View monitoring information (console) · DescribeDBClusterPerformance (API)
		`worker_p95_cpu_used` — P95 CPU utilization of reserved resource nodes
		`worker_avg_cpu_used` — Average CPU utilization of reserved resource nodes
		`executor_max_cpu_used` — Max CPU utilization of elastic compute nodes
		`executor_p95_cpu_used` — P95 CPU utilization of elastic compute nodes
		`executor_avg_cpu_used` — Average CPU utilization of elastic compute nodes
BUILD jobs	`AnalyticDB_BuildTaskCount`	`avg_build_task_count` — Average number of BUILD jobs across all reserved resource nodes	count
		`max_build_task_count` — Maximum number of BUILD jobs across all reserved resource nodes
Compute memory usage	AnalyticDB_ComputeMemoryUsedRatio	The maximum compute memory usage of reserved resource nodes.	max_worker_compute_memory_used_ratio	%
		The P95 compute memory usage of reserved resource nodes.	p95_worker_compute_memory_used_ratio
		The average compute memory usage of reserved resource nodes.	avg_worker_compute_memory_used_ratio
		The maximum compute memory usage of elastic compute nodes.	max_executor_compute_memory_used_ratio
		The P95 compute memory usage of elastic compute nodes.	p95_executor_compute_memory_used_ratio
		The average compute memory usage of elastic compute nodes.	avg_executor_compute_memory_used_ratio
Unavailable nodes	`AnalyticDB_UnavailableNodeCount`	`worker_unavailable_node_count` — Unavailable reserved resource nodes	Item
		`executor_unavailable_node_count` — Unavailable elastic compute nodes
Amount of read table data	`AnalyticDB_Table_Read_Result_Size`	`table_max_read_result_size` — Max amount of read table data	MB
		`table_avg_read_result_size` — Average amount of read table data
CPU utilization of access nodes	`AnalyticDB_RC_CPU`	`rc_max_cpu_used` — Max CPU utilization of access nodes	%
		`rc_p95_cpu_used` — P95 CPU utilization of access nodes
		`rc_controller_avg_cpu_used` — Average CPU utilization of access nodes
Disk I/O throughput	`AnalyticDB_IO`	`worker_max_read_bytes_ratio` — Max disk read throughput of reserved resource nodes	MB/s
		`worker_p95_read_bytes_ratio` — P95 disk read throughput of reserved resource nodes
		`worker_avg_read_bytes_ratio` — Average disk read throughput of reserved resource nodes
		`worker_max_write_bytes_ratio` — Max disk write throughput of reserved resource nodes
		`worker_p95_write_bytes_ratio` — P95 disk write throughput of reserved resource nodes
		`worker_avg_write_bytes_ratio` — Average disk write throughput of reserved resource nodes
Disk IOPS	`AnalyticDB_IOPS`	`worker_max_read_ratio` — Max disk read operations on reserved resource nodes	io/s
		`worker_p95_read_ratio` — P95 disk read operations on reserved resource nodes
		`worker_avg_read_ratio` — Average disk read operations on reserved resource nodes
		`worker_max_write_ratio` — Max disk write operations on reserved resource nodes
		`worker_p95_write_ratio` — P95 disk write operations on reserved resource nodes
		`worker_avg_write_ratio` — Average disk write operations on reserved resource nodes
Disk I/O usage	`AnalyticDB_IO_UTIL`	`worker_max_io_util` — Max disk I/O usage of reserved resource nodes	%
		`worker_p95_io_util` — P95 disk I/O usage of reserved resource nodes
		`worker_avg_io_util` — Average disk I/O usage of reserved resource nodes
Disk I/O wait time	`AnalyticDB_IO_WAIT`	`worker_max_io_await` — Max disk I/O wait time of reserved resource nodes	ms
		`worker_p95_io_await` — P95 disk I/O wait time of reserved resource nodes
		`worker_avg_io_await` — Average disk I/O wait time of reserved resource nodes
Memory usage of access nodes	`AnalyticDB_RC_MemoryUsedRatio`	`rc_max_memory_used_ratio` — Max memory usage of access nodes	%
		`rc_p95_memory_used_ratio` — P95 memory usage of access nodes
		`rc_avg_memory_used_ratio` — Average memory usage of access nodes
Disk I/O throughput of access nodes	`AnalyticDB_RC_IO`	`rc_max_read_mebibytes` — Max read throughput of access nodes	MB/s
		`rc_p95_read_mebibytes` — P95 read throughput of access nodes
		`rc_avg_read_mebibytes` — Average read throughput of access nodes
		`rc_max_write_mebibytes` — Max write throughput of access nodes
		`rc_p95_write_mebibytes` — P95 write throughput of access nodes
		`rc_avg_write_mebibytes` — Average write throughput of access nodes
Disk IOPS of access nodes	`AnalyticDB_RC_IOPS`	`rc_max_read_iops` — Max read operations on access nodes	io/s
		`rc_p95_read_iops` — P95 read operations on access nodes
		`rc_avg_read_iops` — Average read operations on access nodes
		`rc_max_write_iops` — Max write operations on access nodes
		`rc_p95_write_iops` — P95 write operations on access nodes
		`rc_avg_write_iops` — Average write operations on access nodes

Data Lakehouse Edition and Data Warehouse Edition

After switching a Data Warehouse Edition cluster from reserved mode (C32) to elastic mode, average CPU utilization increases. For details, see FAQ.

Metric	Metric key	Metric value name	Unit	References
CPU utilization	`AnalyticDB_CPU`	`executor_max_cpu_used` — Max CPU utilization of compute nodes	%	View monitoring information (console) · API: Data Warehouse Edition · Data Lakehouse Edition
		`executor_p95_cpu_used` — P95 CPU utilization of compute nodes
		`executor_avg_cpu_used` — Average CPU utilization of compute nodes
		`worker_max_cpu_used` — Max CPU utilization of data nodes
		`worker_p95_cpu_used` — P95 CPU utilization of data nodes
		`worker_avg_cpu_used` — Average CPU utilization of data nodes
BUILD jobs	`AnalyticDB_BuildTaskCount`	`avg_build_task_count` — Average number of BUILD jobs across all reserved resource nodes	Item
		`max_build_task_count` — Maximum number of BUILD jobs across all reserved resource nodes
Compute memory usage	AnalyticDB_ComputeMemoryUsedRatio	The maximum compute memory usage.	max_executor_compute_memory_used_ratio	%
		The P95 compute memory usage.	p95_executor_compute_memory_used_ratio
		The average compute memory usage.	avg_executor_compute_memory_used_ratio
Unavailable nodes	`AnalyticDB_UnavailableNodeCount`	`worker_unavailable_node_count` — Unavailable data nodes	count
		`executor_unavailable_node_count` — Unavailable compute nodes
Amount of read table data	`AnalyticDB_Table_Read_Result_Size`	`table_max_read_result_size` — Max amount of read table data	MB
		`table_avg_read_result_size` — Average amount of read table data
CPU utilization of access nodes	`AnalyticDB_RC_CPU`	`rc_max_cpu_used` — Max CPU utilization of access nodes	%
		`rc_p95_cpu_used` — P95 CPU utilization of access nodes
		`rc_controller_avg_cpu_used` — Average CPU utilization of access nodes
Disk I/O throughput	`AnalyticDB_IO`	`worker_max_read_bytes_ratio` — Max disk read throughput of data nodes	MB/s
		`worker_p95_read_bytes_ratio` — P95 disk read throughput of data nodes
		`worker_avg_read_bytes_ratio` — Average disk read throughput of data nodes
		`worker_max_write_bytes_ratio` — Max disk write throughput of data nodes
		`worker_p95_write_bytes_ratio` — P95 disk write throughput of data nodes
		`worker_avg_write_bytes_ratio` — Average disk write throughput of data nodes
Disk IOPS	`AnalyticDB_IOPS`	`worker_max_read_ratio` — Max disk read operations on data nodes	io/s
		`worker_p95_read_ratio` — P95 disk read operations on data nodes
		`worker_avg_read_ratio` — Average disk read operations on data nodes
		`worker_max_write_ratio` — Max disk write operations on data nodes
		`worker_p95_write_ratio` — P95 disk write operations on data nodes
		`worker_avg_write_ratio` — Average disk write operations on data nodes
Disk I/O usage	`AnalyticDB_IO_UTIL`	`worker_max_io_util` — Max disk I/O usage of data nodes	%
		`worker_p95_io_util` — P95 disk I/O usage of data nodes
		`worker_avg_io_util` — Average disk I/O usage of data nodes
Disk I/O wait time	`AnalyticDB_IO_WAIT`	`worker_max_io_await` — Max disk I/O wait time of data nodes	ms
		`worker_p95_io_await` — P95 disk I/O wait time of data nodes
		`worker_avg_io_await` — Average disk I/O wait time of data nodes
Memory usage of access nodes	`AnalyticDB_RC_MemoryUsedRatio`	`rc_max_memory_used_ratio` — Max memory usage of access nodes	%
		`rc_p95_memory_used_ratio` — P95 memory usage of access nodes
		`rc_avg_memory_used_ratio` — Average memory usage of access nodes
Disk I/O throughput of access nodes	`AnalyticDB_RC_IO`	`rc_max_read_mebibytes` — Max read throughput of access nodes	MB/s
		`rc_p95_read_mebibytes` — P95 read throughput of access nodes
		`rc_avg_read_mebibytes` — Average read throughput of access nodes
		`rc_max_write_mebibytes` — Max write throughput of access nodes
		`rc_p95_write_mebibytes` — P95 write throughput of access nodes
		`rc_avg_write_mebibytes` — Average write throughput of access nodes
Disk IOPS of access nodes	`AnalyticDB_RC_IOPS`	`rc_max_read_iops` — Max read operations on access nodes	io/s
		`rc_p95_read_iops` — P95 read operations on access nodes
		`rc_avg_read_iops` — Average read operations on access nodes
		`rc_max_write_iops` — Max write operations on access nodes
		`rc_p95_write_iops` — P95 write operations on access nodes
		`rc_avg_write_iops` — Average write operations on access nodes

Data size monitoring

Enterprise Edition and Basic Edition

Metric	Metric key	Metric value name	Unit	References
Disk usage	`AnalyticDB_DiskUsedRatio`	`disk_used_ratio` — Average disk usage	%	View monitoring information (console) · DescribeDBClusterPerformance (API)
		`worker_max_node_disk_used_ratio` — Max disk usage
Disk space used	`AnalyticDB_DiskUsedSize`	`cold_disk_used` — Size of cold data	Byte
		`hot_disk_used` — Size of hot data
		`user_used_disk_max` — Max hot data size per node
		`user_used_disk_avg` — Average hot data size per node

Data Lakehouse Edition and Data Warehouse Edition

Metric	Metric key	Metric value name	Unit	References
Disk usage	`AnalyticDB_DiskUsedRatio`	`disk_used_ratio` — Average disk usage	%	View monitoring information (console) · API: Data Lakehouse Edition · Data Warehouse Edition
		`worker_max_node_disk_used_ratio` — Max disk usage
Disk space used	`AnalyticDB_DiskUsedSize`	`cold_disk_used` — Size of cold data	Byte
		`hot_disk_used` — Size of hot data
		`user_used_disk_max` — Max hot data size per node
		`user_used_disk_avg` — Average hot data size per node

Workload monitoring

Enterprise Edition and Basic Edition

Metric	Metric key	Metric value name	Unit	References
Cluster connections	`AnalyticDB_Connections`	`connections` — Successful connections	count	View monitoring information (console) · DescribeDBClusterPerformance (API)
Query failure rate¹	`AnalyticDB_QueryFailedRatio`	`query_failed_ratio` — Query failure rate	%
Query QPS	`AnalyticDB_QPS`	`qps` — Queries per second	op/s
		`etl_qps` — Extract, transform, and load (ETL) QPS
Query response time	`AnalyticDB_QueryRT`	`query_avg_rt` — Average query response time	ms
		`query_max_rt` — Max query response time
Query wait time	`AnalyticDB_QueryWaitTime`	`query_avg_wait_time` — Average query wait time	ms
		`query_max_wait_time` — Max query wait time
Write TPS	`AnalyticDB_InsertTPS`	`insert_tps` — Write transactions per second	op/s
Write response time	`AnalyticDB_InsertRT`	`insert_avg_rt` — Average write response time	ms
		`insert_max_rt` — Max write response time
Write throughput	`AnalyticDB_InsertBytes`	`insert_in_bytes` — Average write throughput	MB
Update TPS	`AnalyticDB_UpdateTPS`	`update_tps` — Update TPS	op/s
Update response time	`AnalyticDB_UpdateRT`	`updateinto_avg_rt` — Average update response time	ms
		`updateinto_max_rt` — Max update response time
Delete TPS	`AnalyticDB_DeleteTPS`	`delete_tps` — Delete TPS	op/s
Delete response time	`AnalyticDB_DeleteRT`	`delete_avg_rt` — Average delete response time	ms
		`delete_max_rt` — Max delete response time
Import TPS	`AnalyticDB_LoadTPS`	`load_tps` — Load TPS	op/s

Data Lakehouse Edition and Data Warehouse Edition

Metric	Metric key	Metric value name	Unit	References
Cluster connections	`AnalyticDB_Connections`	`connections` — Successful connections	count	View monitoring information (console) · API: Data Lakehouse Edition · Data Warehouse Edition
Query failure rate¹	`AnalyticDB_QueryFailedRatio`	`query_failed_ratio` — Query failure rate	%
Query QPS	`AnalyticDB_QPS`	`qps` — Queries per second	op/s
		`etl_qps` — ETL QPS
Query response time	`AnalyticDB_QueryRT`	`query_avg_rt` — Average query response time	ms
		`query_max_rt` — Max query response time
Query wait time	`AnalyticDB_QueryWaitTime`	`query_avg_wait_time` — Average query wait time	ms
		`query_max_wait_time` — Max query wait time
Write TPS	`AnalyticDB_InsertTPS`	`insert_tps` — Write TPS	op/s
Write response time	`AnalyticDB_InsertRT`	`insert_avg_rt` — Average write response time	ms
		`insert_max_rt` — Max write response time
Write throughput	`AnalyticDB_InsertBytes`	`insert_in_bytes` — Average write throughput	MB
Update TPS	`AnalyticDB_UpdateTPS`	`update_tps` — Update TPS	op/s
Update response time	`AnalyticDB_UpdateRT`	`updateinto_avg_rt` — Average update response time	ms
		`updateinto_max_rt` — Max update response time
Delete TPS	`AnalyticDB_DeleteTPS`	`delete_tps` — Delete TPS	op/s
Delete response time	`AnalyticDB_DeleteRT`	`delete_avg_rt` — Average delete response time	ms
		`delete_max_rt` — Max delete response time
Import TPS	`AnalyticDB_LoadTPS`	`load_tps` — Load TPS	op/s

¹ Query failure rate is calculated as follows:

Time range within 24 hours: Query failure rate = (Failed SQL queries in 1 minute / Total SQL queries in 1 minute) × 100%

Time range exceeding 24 hours: Query failure rate = (Failed SQL queries in 5 minutes / Total SQL queries in 5 minutes) × 100%

Resource group monitoring

Enterprise Edition, Basic Edition, and Data Lakehouse Edition

Metric	Metric key	Metric value name	Unit	References
CPU utilization	`AnalyticDB_RP_CPU`	`AnalyticDB_RP_CPU` — Average CPU utilization of the resource group	%	View monitoring information (console) · DescribeDBClusterPerformance (API)
Query QPS	`AnalyticDB_RP_QPS`	`AnalyticDB_RP_QPS` — Query QPS of the resource group	op/s
Query response time	`AnalyticDB_RP_RT`	`AnalyticDB_RP_RT` — Average response time of queries in the resource group	ms
Query wait time	`AnalyticDB_RP_WaitTime`	`AnalyticDB_RP_WaitTime` — Total average wait time of queries in the resource group	ms
(Xihe) Running SQL queries	`AnalyticDB_RP_RunningQueries_Count`	`AnalyticDB_RP_RunningQueries_Count` — Running SQL queries in the resource group	Unit
Queued SQL queries	`AnalyticDB_RP_QueuedQueries_Count`	`AnalyticDB_RP_QueuedQueries_Count` — Queued SQL queries in the resource group	count
Computing resource usage²	None	`TotalAcuNumber` — Total computing resources	ACU	View the computing and storage resource usage of a cluster (console) · DescribeClusterResourceUsage (API)
		`ReservedAcuNumber` — Reserved computing resources
Storage resource usage²	None	`TotalAcuNumber` — Total storage resources	ACU	View the computing and storage resource usage of a cluster (console) · DescribeStorageResourceUsage (API)
		`ReservedAcuNumber` — Reserved storage resources
Resource usage	None	`TotalAcuNumber` — Total computing resources	ACU	View the computing and storage resource usage of a cluster (console) · DescribeStorageResourceUsage (API)
		`ReservedAcuNumber` — Reserved resources
Interactive resource group	None	`ReservedAcuNumber` — Min computing resources	ACU	View the computing resource usage of a resource group (console)
		`MaxAcuNumber` — Max computing resources
		`CurrentAcuNumber` — Current computing resource usage
Job resource group	None	`ReservedAcuNumber` — Min computing resources	ACU	View the computing resource usage of a resource group (console) · DescribeJobResourceUsage (API)
		`MaxAcuNumber` — Max computing resources
		`CurrentAcuNumber` — Current computing resource usage
		`SpotAcuNumber` — Spot instance resource usage
Total ACU-hours used by a job	None	`TotalAcuNumber` — Average ACU-hours used by a job	ACU	View the computing resource usage of a job (console)
Reserved ACU-hours	None	`ReservedAcuNumber` — Reserved ACU-hours out of total job ACU-hours	ACU
Elastic ACU-hours	None	`ElasticAcuNumber` — Elastic ACU-hours out of total job ACU-hours	ACU

² Computing resource usage and storage resource usage metrics are supported only by Data Lakehouse Edition.

Data Warehouse Edition

Metric	Metric key	Metric value name	Unit	References
CPU utilization	`AnalyticDB_RP_CPU`	`AnalyticDB_RP_CPU` — Average CPU utilization of the resource group	%	View monitoring information (console) · DescribeDBClusterPerformance (API)
Query QPS	`AnalyticDB_RP_QPS`	`AnalyticDB_RP_QPS` — Query QPS of the resource group	op/s
Query response time	`AnalyticDB_RP_RT`	`AnalyticDB_RP_RT` — Average response time of queries in the resource group	ms
Query wait time	`AnalyticDB_RP_WaitTime`	`AnalyticDB_RP_WaitTime` — Total average wait time of queries in the resource group	ms
Actual Pop-ups	`AnalyticDB_RP_ActualNode`	`AnalyticDB_RP_ActualNode` — Nodes actually added when a scale-out plan executes	Item
Number of Planned PoPs	`AnalyticDB_RP_PlanNode`	`AnalyticDB_RP_PlanNode` — Nodes planned to be added based on a scheduled scaling plan	count
Total nodes	`AnalyticDB_RP_TotalNode`	`AnalyticDB_RP_TotalNode` — Total nodes in the resource group (basic nodes + actual scaled-out nodes from scheduled scaling)	Unit
Basic nodes	`AnalyticDB_RP_OriginalNode`	`AnalyticDB_RP_OriginalNode` — Basic nodes in the resource group	Item

Spark monitoring

Spark monitoring metrics are not available in the AnalyticDB for MySQL console. To view them, go to the CloudMonitor console.

Metric	Description	MetricName	Unit	References
Spark CPU utilization (%)	CPU utilization of Spark	`SparkCpuUtilizationEci`, `SparkCpuUtilizationShenlong`	%	View Spark monitoring information (console) · DescribeMetricList (API)
Spark memory utilization (%)	Memory usage of Spark	`SparkMemoryUtilizationEci`, `SparkMemoryUtilizationShenlong`	%
Peak on-heap execution memory usage (B)	Max JVM heap memory used while a Spark job runs	`SparkExecutorOnHeapExecutionMemoryBytes`	Byte
Peak off-heap execution memory usage (B)	Max memory used outside the JVM heap while a Spark job runs	`SparkExecutorOffHeapExecutionMemoryBytes`	Byte
Peak on-heap storage memory usage (B)	Max JVM heap memory used to store Spark data such as cached Resilient Distributed Datasets (RDDs)	`SparkExecutorOnHeapStorageMemoryBytes`	Byte
Peak off-heap storage memory usage (B)	Max JVM off-heap memory used to store Spark data such as cached RDDs	`SparkExecutorOffHeapStorageMemoryBytes`	Byte
RDD storage disk usage (B)	Disk space used by RDDs in Spark	`SparkExecutorDiskUsedBytes`	Byte
Major GC count (count)	Number of major garbage collections (GCs) performed by the JVM while a Spark job runs	`SparkExecutorMajorGCCount`	count
Minor GC count (count)	Number of minor GCs performed by the JVM while a Spark job runs	`SparkExecutorMinorGCCount`	Unit
Spark GC time (s)	Total time consumed by Spark garbage collection	`SparkExecutorTotalGCTimeSeconds`	s
Spark shuffle read data size (B)	Size of data read during a Spark shuffle	`SparkExecutorTotalShuffleReadBytes`	Byte
Spark shuffle write data size (B)	Size of data written during a Spark shuffle	`SparkExecutorTotalShuffleWriteBytes`	Byte

References

Optimize cluster performance based on monitoring information — describes performance-related metrics, explains how to diagnose abnormal metric values, and provides troubleshooting and optimization guidance.