This topic lists all the monitoring metrics for AnalyticDB for MySQL.
Cluster health status
Enterprise Edition and Basic Edition
Metric | Metric | References |
Cluster access node status | The access layer of AnalyticDB for MySQL consists of multiple access nodes. It is responsible for protocol-layer access, SQL parsing and optimization, real-time write sharding, data scheduling, and query scheduling. The health check status of cluster access nodes includes:
|
|
Elastic compute node health status | Elastic compute nodes are computing resources that are temporarily scaled up for scheduled or on-demand scaling. This allows resources to be scaled out within seconds or minutes, which ensures efficient resource utilization. The health check status of elastic compute nodes includes:
| |
Reserved resource node health status | Reserved resource nodes are pre-purchased resources in a cluster. You can change the node specifications and number of nodes for reserved resources by performing an upgrade or downgrade or using scheduled scaling. The reserved resource nodes of Enterprise Edition and Basic Edition clusters use a storage-compute coupled architecture and run both compute and storage engines. The health check status of reserved resource node groups includes:
|
Data Lakehouse Edition and Data Warehouse Edition
Metric | Metric | References |
Cluster access node status | The access layer of AnalyticDB for MySQL consists of multiple instance access nodes. It is responsible for protocol-layer access, SQL parsing and optimization, real-time write sharding, data scheduling, and query scheduling. The health check status of instance access nodes includes:
|
|
Compute node health status | The compute engine of AnalyticDB for MySQL consists of compute nodes. It supports integrated execution of distributed Massively Parallel Processing (MPP) and directed acyclic graph (DAG) architectures. The compute engine works with an intelligent optimizer to support high concurrency and hybrid loads of complex SQL statements. In addition, the cloud-native infrastructure allows compute nodes to implement elastic scheduling within minutes or even seconds based on business requirements. This ensures efficient resource utilization. The health check status of compute node groups includes:
| |
Data node health status | The storage engine of AnalyticDB for MySQL consists of data nodes. It is a distributed, real-time, and high-availability (HA) storage engine that ensures strong consistency based on the Raft protocol. The storage engine uses data sharding and Multi-Raft to support parallel storage, tiered storage to implement hot and cold data separation at lower costs, and hybrid row-column storage and intelligent indexes to deliver high performance. The health check status of data node groups includes:
|
Cluster performance monitoring
Node monitoring
Enterprise Edition and Basic Edition
Metric | Metric key | Metric | Metric value name | Unit | References |
CPU utilization | AnalyticDB_CPU | The maximum CPU utilization of reserved resource nodes. | worker_max_cpu_used | % |
|
The P95 CPU utilization of reserved resource nodes. | worker_p95_cpu_used | ||||
The average CPU utilization of reserved resource nodes. | worker_avg_cpu_used | ||||
The maximum CPU utilization of elastic compute nodes. | executor_max_cpu_used | ||||
The P95 CPU utilization of elastic compute nodes. | executor_p95_cpu_used | ||||
The average CPU utilization of elastic compute nodes. | executor_avg_cpu_used | ||||
BUILD jobs | AnalyticDB_BuildTaskCount | The average number of BUILD jobs. Note This metric indicates the average number of BUILD jobs that run across all reserved resource nodes. | avg_build_task_count | count | |
The maximum number of BUILD jobs. Note This metric indicates the maximum number of BUILD jobs that run across all reserved resource nodes. | max_build_task_count | ||||
Unavailable nodes | AnalyticDB_UnavailableNodeCount | The number of unavailable reserved resource nodes. | worker_unavailable_node_count | Item | |
The number of unavailable elastic compute nodes. | executor_unavailable_node_count | ||||
Amount of read table data | AnalyticDB_Table_Read_Result_Size | The maximum amount of read table data. | table_max_read_result_size | MB | |
The average amount of read table data. | table_avg_read_result_size | ||||
CPU utilization of access nodes | AnalyticDB_RC_CPU | The maximum CPU utilization of access nodes. | rc_max_cpu_used | % | |
The P95 CPU utilization of access nodes. | rc_p95_cpu_used | ||||
The average CPU utilization of access nodes. | rc_controller_avg_cpu_used | ||||
Disk I/O throughput | AnalyticDB_IO | The maximum disk read throughput of reserved resource nodes. | worker_max_read_bytes_ratio | MB/s | |
The P95 disk read throughput of reserved resource nodes. | worker_p95_read_bytes_ratio | ||||
The average disk read throughput of reserved resource nodes. | worker_avg_read_bytes_ratio | ||||
The maximum disk write throughput of reserved resource nodes. | worker_max_write_bytes_ratio | ||||
The P95 disk write throughput of reserved resource nodes. | worker_p95_write_bytes_ratio | ||||
The average disk write throughput of reserved resource nodes. | worker_avg_write_bytes_ratio | ||||
Disk IOPS | AnalyticDB_IOPS | The maximum number of disk read operations on reserved resource nodes. | worker_max_read_ratio | io/s | |
The P95 number of disk read operations on reserved resource nodes. | worker_p95_read_ratio | ||||
The average number of disk read operations on reserved resource nodes. | worker_avg_read_ratio | ||||
The maximum number of disk write operations on reserved resource nodes. | worker_max_write_ratio | ||||
The P95 number of disk write operations on reserved resource nodes. | worker_p95_write_ratio | ||||
The average number of disk write operations on reserved resource nodes. | worker_avg_write_ratio | ||||
Disk I/O usage | AnalyticDB_IO_UTIL | The maximum disk I/O usage of reserved resource nodes. | worker_max_io_util | % | |
The P95 disk I/O usage of reserved resource nodes. | worker_p95_io_util | ||||
The average disk I/O usage of reserved resource nodes. | worker_avg_io_util | ||||
Disk I/O wait time | AnalyticDB_IO_WAIT | The maximum disk I/O wait time of reserved resource nodes. | worker_max_io_await | ms | |
The P95 disk I/O wait time of reserved resource nodes. | worker_p95_io_await | ||||
The average disk I/O wait time of reserved resource nodes. | worker_avg_io_await | ||||
Memory usage of access nodes | AnalyticDB_RC_MemoryUsedRatio | The maximum memory usage of access nodes. | rc_max_memory_used_ratio | % | |
The P95 memory usage of access nodes. | rc_p95_memory_used_ratio | ||||
The average memory usage of access nodes. | rc_avg_memory_used_ratio | ||||
Disk I/O throughput of access nodes | AnalyticDB_RC_IO | The maximum read throughput of access nodes. | rc_max_read_mebibytes | MB/s | |
The P95 read throughput of access nodes. | rc_p95_read_mebibytes | ||||
The average read throughput of access nodes. | rc_avg_read_mebibytes | ||||
The maximum write throughput of access nodes. | rc_max_write_mebibytes | ||||
The P95 write throughput of access nodes. | rc_p95_write_mebibytes | ||||
The average write throughput of access nodes. | rc_avg_write_mebibytes | ||||
Disk IOPS of access nodes | AnalyticDB_RC_IOPS | The maximum number of read operations on access nodes. | rc_max_read_iops | io/s | |
The P95 number of read operations on access nodes. | rc_p95_read_iops | ||||
The average number of read operations on access nodes. | rc_avg_read_iops | ||||
The maximum number of write operations on access nodes. | rc_max_write_iops | ||||
The P95 number of write operations on access nodes. | rc_p95_write_iops | ||||
The average number of write operations on access nodes. | rc_avg_write_iops |
Data Lakehouse Edition and Data Warehouse Edition
Metric | Metric key | Metric | Metric value name | Unit | References |
CPU utilization Note After you change a Data Warehouse Edition in reserved mode C32 cluster to elastic mode, the average CPU utilization increases. For more information, see FAQ. | AnalyticDB_CPU | The maximum CPU utilization of compute nodes. | executor_max_cpu_used | % |
|
The P95 CPU utilization of compute nodes. | executor_p95_cpu_used | ||||
The average CPU utilization of compute nodes. | executor_avg_cpu_used | ||||
The maximum CPU utilization of data nodes. | worker_max_cpu_used | ||||
The P95 CPU utilization of data nodes. | worker_p95_cpu_used | ||||
The average CPU utilization of data nodes. | worker_avg_cpu_used | ||||
BUILD jobs | AnalyticDB_BuildTaskCount | The average number of BUILD jobs. Note This metric indicates the average number of BUILD jobs that run across all reserved resource nodes. | avg_build_task_count | Item | |
The maximum number of BUILD jobs. Note This metric indicates the maximum number of BUILD jobs that run across all reserved resource nodes. | max_build_task_count | ||||
Unavailable nodes monitoring | AnalyticDB_UnavailableNodeCount | The number of unavailable data nodes. | worker_unavailable_node_count | count | |
The number of unavailable compute nodes. | executor_unavailable_node_count | ||||
Amount of read table data | AnalyticDB_Table_Read_Result_Size | The maximum amount of read table data. | table_max_read_result_size | MB | |
The average amount of read table data. | table_avg_read_result_size | ||||
CPU utilization of access nodes | AnalyticDB_RC_CPU | The maximum CPU utilization of access nodes. | rc_max_cpu_used | % | |
The P95 CPU utilization of access nodes. | rc_p95_cpu_used | ||||
The average CPU utilization of access nodes. | rc_controller_avg_cpu_used | ||||
Disk I/O throughput | AnalyticDB_IO | The maximum disk read throughput of data nodes. | worker_max_read_bytes_ratio | MB/s | |
The P95 disk read throughput of data nodes. | worker_p95_read_bytes_ratio | ||||
The average disk read throughput of data nodes. | worker_avg_read_bytes_ratio | ||||
The maximum disk write throughput of data nodes. | worker_max_write_bytes_ratio | ||||
The P95 disk write throughput of data nodes. | worker_p95_write_bytes_ratio | ||||
The average disk write throughput of data nodes. | worker_avg_write_bytes_ratio | ||||
Disk IOPS | AnalyticDB_IOPS | The maximum number of disk read operations on data nodes. | worker_max_read_ratio | io/s | |
The P95 number of disk read operations on data nodes. | worker_p95_read_ratio | ||||
The average number of disk read operations on data nodes. | worker_avg_read_ratio | ||||
The maximum number of disk write operations on data nodes. | worker_max_write_ratio | ||||
The P95 number of disk write operations on data nodes. | worker_p95_write_ratio | ||||
The average number of disk write operations on data nodes. | worker_avg_write_ratio | ||||
Disk I/O usage | AnalyticDB_IO_UTIL | The maximum disk I/O usage of data nodes. | worker_max_io_util | % | |
The P95 disk I/O usage of data nodes. | worker_p95_io_util | ||||
The average disk I/O usage of data nodes. | worker_avg_io_util | ||||
Disk I/O wait time | AnalyticDB_IO_WAIT | The maximum disk I/O wait time of data nodes. | worker_max_io_await | ms | |
The P95 disk I/O wait time of data nodes. | worker_p95_io_await | ||||
The average disk I/O wait time of data nodes. | worker_avg_io_await | ||||
Memory usage of access nodes | AnalyticDB_RC_MemoryUsedRatio | The maximum memory usage of access nodes. | rc_max_memory_used_ratio | % | |
The P95 memory usage of access nodes. | rc_p95_memory_used_ratio | ||||
The average memory usage of access nodes. | rc_avg_memory_used_ratio | ||||
Disk I/O Throughput of Access Nodes | AnalyticDB_RC_IO | The maximum read throughput of access nodes. | rc_max_read_mebibytes | MB/s | |
The P95 read throughput of access nodes. | rc_p95_read_mebibytes | ||||
The average read throughput of access nodes. | rc_avg_read_mebibytes | ||||
The maximum write throughput of access nodes. | rc_max_write_mebibytes | ||||
The P95 write throughput of access nodes. | rc_p95_write_mebibytes | ||||
The average write throughput of access nodes. | rc_avg_write_mebibytes | ||||
Disk IOPS of access nodes | AnalyticDB_RC_IOPS | The maximum number of read operations on access nodes. | rc_max_read_iops | io/s | |
The P95 number of read operations on access nodes. | rc_p95_read_iops | ||||
The average number of read operations on access nodes. | rc_avg_read_iops | ||||
The maximum number of write operations on access nodes. | rc_max_write_iops | ||||
The P95 number of write operations on access nodes. | rc_p95_write_iops | ||||
The average number of write operations on access nodes. | rc_avg_write_iops |
Data size monitoring
Enterprise Edition and Basic Edition
Metric | Metric key | Metric | Metric value name | Unit | References |
Disk usage | AnalyticDB_DiskUsedRatio | The average disk usage. | disk_used_ratio | % |
|
The maximum disk usage. | worker_max_node_disk_used_ratio | ||||
Disk space used | AnalyticDB_DiskUsedSize | The size of cold data. | cold_disk_used | Byte | |
The size of hot data. | hot_disk_used | ||||
The maximum size of hot data for a node. | user_used_disk_max | ||||
The average size of hot data for a node. | user_used_disk_avg |
Data Lakehouse Edition and Data Warehouse Edition
Metric | Metric key | Metric | Metric value name | Unit | References |
Disk usage | AnalyticDB_DiskUsedRatio | The average disk usage. | disk_used_ratio | % |
|
The maximum disk usage. | worker_max_node_disk_used_ratio | ||||
Disk space used | AnalyticDB_DiskUsedSize | The size of cold data. | cold_disk_used | Byte | |
The size of hot data. | hot_disk_used | ||||
The maximum size of hot data for a node. | user_used_disk_max | ||||
The average size of hot data for a node. | user_used_disk_avg |
Workload monitoring
Enterprise Edition and Basic Edition
Metric | Metric key | Metric | Metric value name | Unit | References |
Cluster connections | AnalyticDB_Connections | The number of successful connections. | connections | count |
|
Query failure rate1 | AnalyticDB_QueryFailedRatio | The failure rate of queries. | query_failed_ratio | % | |
Query QPS | AnalyticDB_QPS | The queries per second (QPS). | qps | op/s | |
The extract, transform, and load (ETL) QPS. | etl_qps | ||||
Query response time | AnalyticDB_QueryRT | The average query response time. | query_avg_rt | ms | |
The maximum query response time. | query_max_rt | ||||
Query wait time | AnalyticDB_QueryWaitTime | The average query wait time. | query_avg_wait_time | ms | |
The maximum query wait time. | query_max_wait_time | ||||
Write TPS | AnalyticDB_InsertTPS | The write transactions per second (TPS) of a cluster. | insert_tps | op/s | |
Write response time | AnalyticDB_InsertRT | The average write response time. | insert_avg_rt | ms | |
The maximum write response time. | insert_max_rt | ||||
Write throughput | AnalyticDB_InsertBytes | The average write throughput of a cluster. | insert_in_bytes | MB | |
Update TPS | AnalyticDB_UpdateTPS | The update TPS of a cluster. | update_tps | op/s | |
Update response time | AnalyticDB_UpdateRT | The average update response time. | updateinto_avg_rt | ms | |
The maximum update response time. | updateinto_max_rt | ||||
Delete TPS | AnalyticDB_DeleteTPS | The delete TPS of a cluster. | delete_tps | op/s | |
Delete response time | AnalyticDB_DeleteRT | The average delete response time. | delete_avg_rt | ms | |
The maximum delete response time. | delete_max_rt | ||||
Import TPS | AnalyticDB_LoadTPS | The load TPS of a cluster. | load_tps | op/s |
Data Lakehouse Edition and Data Warehouse Edition
Metric | Metric key | Metric | Metric value name | Unit | References |
Cluster connections | AnalyticDB_Connections | The number of successful connections. | connections | count |
|
Query failure rate1 | AnalyticDB_QueryFailedRatio | The failure rate of queries. | query_failed_ratio | % | |
Query QPS | AnalyticDB_QPS | The QPS. | qps | op/s | |
The ETL QPS. | etl_qps | ||||
Query response time | AnalyticDB_QueryRT | The average query response time. | query_avg_rt | ms | |
The maximum query response time. | query_max_rt | ||||
Query wait time | AnalyticDB_QueryWaitTime | The average query wait time. | query_avg_wait_time | ms | |
The maximum query wait time. | query_max_wait_time | ||||
Write TPS | AnalyticDB_InsertTPS | The write TPS of a cluster. | insert_tps | op/s | |
Write response time | AnalyticDB_InsertRT | The average write response time. | insert_avg_rt | ms | |
The maximum write response time. | insert_max_rt | ||||
Write throughput | AnalyticDB_InsertBytes | The average write throughput of a cluster. | insert_in_bytes | MB | |
Update TPS | AnalyticDB_UpdateTPS | The update TPS of a cluster. | update_tps | op/s | |
Update response time | AnalyticDB_UpdateRT | The average update response time. | updateinto_avg_rt | ms | |
The maximum update response time. | updateinto_max_rt | ||||
Delete TPS | AnalyticDB_DeleteTPS | The delete TPS of a cluster. | delete_tps | op/s | |
Delete response time | AnalyticDB_DeleteRT | The average delete response time. | delete_avg_rt | ms | |
The maximum delete response time. | delete_max_rt | ||||
Import TPS | AnalyticDB_LoadTPS | The load TPS of a cluster. | load_tps | op/s |
Query failure rate1:
If you select a time range within 24 hours, the query failure rate is calculated using the following formula:
Query failure rate = (Number of failed SQL queries in 1 minute/Total number of SQL queries in 1 minute) × 100%.If you select a time range that exceeds 24 hours, the query failure rate is calculated using the following formula:
Query failure rate = (Number of failed SQL queries in 5 minutes/Total number of SQL queries in 5 minutes) × 100%.
Resource group monitoring
Enterprise Edition, Basic Edition, and Data Lakehouse Edition
Metric | Metric key | Metric | Metric value name | Unit | References |
CPU utilization | AnalyticDB_RP_CPU | The average CPU utilization of a resource group. | AnalyticDB_RP_CPU | % |
|
Query QPS | AnalyticDB_RP_QPS | The query QPS of a resource group. | AnalyticDB_RP_QPS | op/s | |
Query response time | AnalyticDB_RP_RT | The average response time of queries in a resource group. | AnalyticDB_RP_RT | ms | |
Query wait time | AnalyticDB_RP_WaitTime | The total average wait time of queries in a resource group. | AnalyticDB_RP_WaitTime | ms | |
(Xihe) Running SQL queries | AnalyticDB_RP_RunningQueries_Count | The number of running SQL queries in a resource group. | AnalyticDB_RP_RunningQueries_Count | Unit | |
Queued SQL queries | AnalyticDB_RP_QueuedQueries_Count | The number of queued SQL queries in a resource group. | AnalyticDB_RP_QueuedQueries_Count | count | |
Computing resource usage Note This metric is supported only by Data Lakehouse Edition. | None | The total amount of computing resources. | TotalAcuNumber | ACU |
|
The amount of reserved computing resources. | ReservedAcuNumber | ||||
Storage resource usage Note This metric is supported only by Data Lakehouse Edition. | None | The total amount of storage resources. | TotalAcuNumber | ACU |
|
The amount of reserved storage resources. | ReservedAcuNumber | ||||
Resource usage | None | The total amount of computing resources. | TotalAcuNumber | ACU |
|
The amount of reserved resources. | ReservedAcuNumber | ||||
Interactive resource group | None | The minimum amount of computing resources. | ReservedAcuNumber | ACU | Operation document: View the computing resource usage of a resource group |
The maximum amount of computing resources. | MaxAcuNumber | ||||
The current computing resource usage. | CurrentAcuNumber | ||||
Job resource group | None | The minimum amount of computing resources. | ReservedAcuNumber | ACU |
|
The maximum amount of computing resources. | MaxAcuNumber | ||||
The current computing resource usage. | CurrentAcuNumber | ||||
The spot instance resource usage. | SpotAcuNumber | ||||
Total ACU-hours used by a job | None | The average number of ACU-hours used by a job. | TotalAcuNumber | ACU | Operation document: View the computing resource usage of a job |
Reserved ACU-hours | None | The number of reserved ACU-hours among the total ACU-hours used by a job. | ReservedAcuNumber | ACU | |
Elastic ACU-hours | None | The number of elastic ACU-hours among the total ACU-hours used by a job. | ElasticAcuNumber | ACU |
Data Warehouse Edition
Metric | Metric key | Metric | Metric value name | Unit | References |
CPU utilization | AnalyticDB_RP_CPU | The average CPU utilization of a resource group. | AnalyticDB_RP_CPU | % |
|
Query QPS | AnalyticDB_RP_QPS | The query QPS of a resource group. | AnalyticDB_RP_QPS | op/s | |
Query response time | AnalyticDB_RP_RT | The average response time of queries in a resource group. | AnalyticDB_RP_RT | ms | |
Query wait time | AnalyticDB_RP_WaitTime | The total average wait time of queries in a resource group. | AnalyticDB_RP_WaitTime | ms | |
Actual Pop-ups | AnalyticDB_RP_ActualNode | The number of nodes that are added to a resource group based on a scheduled scaling plan. This is the actual number of nodes added when a scale-out plan is executed. | AnalyticDB_RP_ActualNode | Item | |
Number of Planned PoPs | AnalyticDB_RP_PlanNode | The number of nodes that you want to add to a resource group based on a scheduled scaling plan. For more information about how to create a resource scaling plan, see Create a resource scaling plan. | AnalyticDB_RP_PlanNode | count | |
Total nodes | AnalyticDB_RP_TotalNode | The total number of nodes in a resource group. Total nodes = Basic nodes + Actual scaled-out nodes from scheduled scaling. | AnalyticDB_RP_TotalNode | Unit | |
Basic nodes | AnalyticDB_RP_OriginalNode | The number of basic nodes in a resource group. | AnalyticDB_RP_OriginalNode | Item |
Spark monitoring
You cannot view Spark monitoring information directly in the AnalyticDB for MySQL console. To view this information, you must go to the CloudMonitor console.
Metric | Description | MetricName | Unit | References |
Spark CPU utilization (%) | The CPU utilization of Spark. |
| % |
|
Spark memory utilization (%) | The memory usage of Spark. |
| % | |
Peak on-heap execution memory usage (B) | The maximum amount of JVM heap memory that is used when a Spark job is running. | SparkExecutorOnHeapExecutionMemoryBytes | Byte | |
Peak off-heap execution memory usage (B) | The maximum amount of memory that is used in addition to the JVM heap memory when a Spark job is running. | SparkExecutorOffHeapExecutionMemoryBytes | Byte | |
Peak on-heap storage memory usage (B) | The maximum amount of JVM heap memory that is used to store Spark data, such as cached RDDs. | SparkExecutorOnHeapStorageMemoryBytes | Byte | |
Peak off-heap storage memory usage (B) | The maximum amount of JVM off-heap memory that is used to store Spark data, such as cached RDDs. | SparkExecutorOffHeapStorageMemoryBytes | Byte | |
RDD storage disk usage (B) | The disk space that is used by Resilient Distributed Datasets (RDDs) in Spark. | SparkExecutorDiskUsedBytes | Byte | |
Major GC count (count) | The number of major garbage collections (Major GCs) that are performed by the JVM garbage collection mechanism when a Spark job is running. | SparkExecutorMajorGCCount | count | |
Minor GC count (count) | The number of minor garbage collections (Minor GCs) that are performed by the JVM garbage collection mechanism when a Spark job is running. | SparkExecutorMinorGCCount | Unit | |
Spark GC time (s) | The time consumed by Spark GC. | SparkExecutorTotalGCTimeSeconds | s | |
Spark shuffle read data size (B) | The size of data read during a Spark shuffle. | SparkExecutorTotalShuffleReadBytes | Byte | |
Spark shuffle write data size (B) | The size of data written during a Spark shuffle. | SparkExecutorTotalShuffleWriteBytes | Byte |
References
Optimize cluster performance based on monitoring information: describes metrics related to cluster performance and running status, explains how to analyze the causes of abnormal metrics, and provides methods for troubleshooting and optimization.