AnalyticDB for PostgreSQL exposes two sets of metric parameters: health metrics that show the overall status of your instance, and performance metrics that track resource utilization over time. Use these parameters to monitor instance health, connection usage, storage capacity, query concurrency, and node-level resource consumption.
Retrieve health metrics by calling DescribeHealthStatus, and performance metrics by calling DescribeDBClusterPerformance.
Elastic storage mode
Health metrics (DescribeHealthStatus)
All status values are color-coded in the console: critical appears in red, warning in yellow, and healthy in green.
| Parameter | Metric | Unit | Description |
|---|---|---|---|
adbpg_status | Instance health status | N/A | Overall instance health. critical: a coordinator node or compute node is unavailable. healthy: all nodes are available. |
adbpg_connection_status | Instance connection status | N/A | Connection health based on the highest connection usage across all coordinator and compute nodes. critical: usage > 95%. warning: usage > 90% and ≤ 95%. healthy: usage ≤ 90%. |
adbpg_disk_status | Instance storage status | N/A | Storage health based on the average storage usage of all compute nodes. critical: usage ≥ 90% (instance is locked). warning: usage ≥ 70% and < 90%. healthy: usage < 70%. |
adbpg_disk_usage_percent | Instance storage usage | % | Storage usage percentage based on the average usage of all compute nodes. Same thresholds as adbpg_disk_status. |
adbpg_master_disk_usage_percent_max | Maximum storage usage of coordinator nodes | % | Status based on the highest storage usage across all coordinator nodes. critical: usage ≥ 90% (instance is locked). warning: usage ≥ 70% and < 90%. healthy: usage < 70%. |
adbgp_segment_disk_usage_percent_max | Maximum storage usage of compute nodes | % | Status based on the highest storage usage across all compute nodes. critical: usage ≥ 90% (instance is locked). warning: usage ≥ 80% and < 90%. healthy: usage < 80%. |
node_master_status | Coordinator node health status | N/A | Health of coordinator nodes. critical: a coordinator node is unavailable. healthy: all coordinator nodes are available. |
node_segment_disk_status | Compute node storage status | N/A | Storage status based on the highest storage usage across all compute nodes. critical: usage ≥ 90% (instance is locked). warning: usage ≥ 80% and < 90%. healthy: usage < 80%. |
node_master_connection_status | Coordinator node connection status | N/A | Connection status based on the highest connection usage across all coordinator nodes. critical: usage ≥ 95%. warning: usage ≥ 90% and < 95%. healthy: usage < 90%. |
node_segment_connection_status | Compute node connection status | N/A | Connection status based on the highest connection usage across all compute nodes. critical: usage ≥ 95%. warning: usage ≥ 90% and < 95%. healthy: usage < 90%. |
adbpg_instance_total_data_gb | Total storage | GB | Total volume of instance storage. Available in the console for instances running V6.3.11.3 or later. |
adbpg_instance_hot_data_gb | Hot storage | GB | Total volume of hot storage. Available in the console for instances running V6.3.11.3 or later. |
adbpg_instance_cold_data_gb | Cold storage | GB | Total volume of cold storage. Available in the console for instances running V6.3.11.3 or later. |
warning at 70% and critical at 90%. Compute node storage triggers warning at 80% and critical at 90%.Performance metrics (DescribeDBClusterPerformance)
The Scope column indicates whether the metric applies to the entire instance or to individual nodes.
| Parameter | Metric | Unit | Scope | Description |
|---|---|---|---|---|
adbpg_segment_cnt | Number of compute nodes | N/A | Instance | Number of compute nodes, sampled once per hour. |
adbpg_instance_disk_used_mb | Instance storage used | MB | Instance | Storage used across all compute nodes. |
adbpg_instance_disk_usage_percent | Instance storage usage | % | Instance | Storage usage calculated as: storage used by all compute nodes ÷ reserved storage of all compute nodes. |
node_master_connection_count | Coordinator node connections | N/A | Node | Number of active connections to coordinator nodes. |
node_segment_connection_count | Compute node connections | N/A | Node | Number of active connections to compute nodes. |
node_segment_workfile_used_mb | Temporary disk file size | MB | Node | Size of temporary files written to disk. |
node_cpu_used_percent | CPU utilization | % | Node | CPU utilization of coordinator or compute nodes. |
node_mem_used_percent | Memory usage | % | Node | Memory usage of coordinator or compute nodes. |
node_disk_iops_percent | I/O usage | % | Node | I/O usage of coordinator or compute nodes. |
node_disk_used_mb | Storage used | MB | Node | Storage used by each compute node. |
node_disk_usage_percent | Storage usage | % | Node | Storage usage of each compute node, calculated as: storage used ÷ reserved storage. |
adbpg_rsq_cost | Resource queue query cost | N/A | Instance | Estimated total cost of all queries in the resource queue. rsqCostLimit: the cost limit (-1 means no limit). rsqCostValue: the total cost of queries currently running. |
adbpg_rsq_count | Resource queue concurrent queries | N/A | Instance | Number of queries running concurrently in the resource queue. rsqCountLimit: the maximum number of concurrent queries allowed (-1 means no limit). rsqCountValue: the number of queries currently running. |
adbpg_rsq_memory | Resource queue memory | Byte | Instance | Memory used by all queries in the resource queue. rsqMemoryLimit: the memory limit (-1 means no limit). rsqMemoryValue: the memory used by queries currently running. |
adbpg_rsq_waiters | Queries waiting for a resource queue slot | N/A | Instance | Number of queries waiting because the resource queue has reached its concurrency or resource limit. These queries appear in the pg_stat_activity view with a waiting state. |
adbpg_rsq_holders | Queries holding a resource queue slot | N/A | Instance | Number of queries that have obtained a resource queue slot. These queries are not guaranteed to be active — if they are waiting for other resources such as row locks, they remain in the slot without releasing it. |
adbpg_db_qps | Instance QPS (queries per second) | N/A | Instance | Read queries processed per second, including SELECT, SELECT INTO, SELECT FOR UPDATE, and FETCH. |
adbpg_db_tps | Instance TPS (transactions per second) | N/A | Instance | Write operations processed per second, including INSERT, UPDATE, DELETE, and INSERT INTO SELECT. |
Diagnosing resource queue bottlenecks
Monitor adbpg_rsq_waiters and adbpg_rsq_holders together to diagnose resource queue bottlenecks:
If
adbpg_rsq_waitersis non-zero, the queue has reached its concurrency limit (rsqCountLimit) or resource limit. Checkadbpg_rsq_countto see whether the limit is set appropriately.In most cases,
rsqCountValue(fromadbpg_rsq_count) equalsadbpg_rsq_holders. In rare cases, the value ofadbpg_rsq_holdersmay be greater than that ofrsqCountValue. A persistent and significant gap between these two values may indicate that resource queue locks were not released after queries completed due to a system exception. If this persists, contact Alibaba Cloud technical support.
Serverless mode
Health metrics (DescribeHealthStatus)
All status values are color-coded in the console: critical appears in red, warning in yellow, and healthy in green.
| Parameter | Metric | Unit | Description |
|---|---|---|---|
adbpg_status | Instance health status | N/A | Overall instance health. critical: a coordinator node or compute node is unavailable. healthy: all nodes are available. |
adbpg_connection_status | Instance connection status | N/A | Connection health based on the highest connection usage across all coordinator and compute nodes. critical: usage > 95%. warning: usage > 90% and ≤ 95%. healthy: usage ≤ 90%. |
node_master_status | Coordinator node health status | N/A | Health of coordinator nodes. critical: a coordinator node is unavailable. healthy: all coordinator nodes are available. |
node_master_connection_status | Coordinator node connection status | N/A | Connection status based on the highest connection usage across all coordinator nodes. critical: usage ≥ 95%. warning: usage ≥ 90% and < 95%. healthy: usage < 90%. |
node_segment_connection_status | Compute node connection status | N/A | Connection status based on the highest connection usage across all compute nodes. critical: usage ≥ 95%. warning: usage ≥ 90% and < 95%. healthy: usage < 90%. |
adbpg_master_disk_usage_percent_max | Maximum storage usage of coordinator nodes | % | Status based on the highest storage usage across all coordinator nodes. critical: usage ≥ 90% (instance is locked). warning: usage ≥ 70% and < 90%. healthy: usage < 70%. |
Performance metrics (DescribeDBClusterPerformance)
The Scope column indicates whether the metric applies to the entire instance or to individual nodes.
| Parameter | Metric | Unit | Scope | Description |
|---|---|---|---|---|
adbpg_acu_used | ACU usage | AnalyticDB compute unit (ACU) | Instance | Computing resources currently used by the instance. |
adbpg_segment_cnt | Number of compute nodes | N/A | Instance | Number of compute nodes, sampled once per hour. |
adbpg_instance_disk_used_mb | Instance storage used | MB | Instance | Object Storage Service (OSS) storage space used by the instance. |
node_master_connection_count | Coordinator node connections | N/A | Node | Number of active connections to coordinator nodes. |
node_segment_connection_count | Compute node connections | N/A | Node | Number of active connections to compute nodes. |
node_segment_workfile_used_mb | Temporary disk file size | MB | Node | Size of temporary files written to disk. |
node_cpu_used_percent | CPU utilization | % | Node | CPU utilization of coordinator or compute nodes. |
node_mem_used_percent | Memory usage | % | Node | Memory usage of coordinator or compute nodes. |