Managed Service for Prometheus provides Prometheus managed service and container monitoring service. Container monitoring service incurs container monitoring fees, which include monitoring cluster scale fees and Prometheus instance fees. Container monitoring service is available in two versions: basic and Pro. This topic describes the operations, billing details, features, supported dashboards, and default alert rules of container monitoring Pro version.
Limits
Only ACK Pro clusters support the activation of Container Monitoring Pro Edition. By default, other types of container clusters are limited to using Container Monitoring Basic Edition.
Prerequisites
Container monitoring Pro version requires Managed Service for Prometheus. You need to first activate Managed Service for Prometheus (pay-as-you-go by write volume activation link, pay-as-you-go by reporting volume activation link), and then activate container monitoring Pro version.
Container monitoring Pro version billing details
Billing item | Billing description | Billable methods | Billing cycle |
Monitoring cluster scale fee | OCU usage is calculated based on the container cluster node scale, with every 10 cluster nodes converted to 1 OCU. Note OCU: Observability Capacity Unit is a new billing unit introduced by Alibaba Cloud Native Observability. It automatically calculates OCU usage based on hourly resource usage. The price of OCU is 0.023 USD/unit. | Pay-as-you-go: Daily container cluster scale fee = Sum of hourly OCU units × OCU unit price Note Hourly OCU units = Maximum number of nodes in the current billing cycle divided by 10, rounded up | The billing cycle is hourly. Managed Service for Prometheus calculates the maximum number of cluster nodes for each hour of the previous day after 00:00, then calculates the hourly OCU units according to billing rules, accumulates the hourly OCU units to calculate the total OCU amount for the previous day, multiplies by the OCU unit price, and generates the container cluster scale monitoring fee on a daily basis. |
Prometheus instance fee | For more information, see Prometheus instance billing. | ||
How to use container monitoring Pro version
Method 1: Select container monitoring Pro version during integration
On the Integration Center page, select Container Cluster Monitoring.
In the Container Cluster Monitoring panel, select the container service cluster to be integrated, then select Container Monitoring Pro Version as the version, and click OK.

Method 2: Upgrade from basic version to container monitoring Pro version
After upgrading to container monitoring Pro version, you cannot downgrade to the basic version.
On the Provisioning page, select Integrated Environments > Container Environment.
Click Upgrade in the Operation column of the container monitoring to be upgraded. In the dialog box, click Confirm.

Differences between basic version and Pro version
Category | Basic version | Pro version |
Container cluster basic metrics storage period | 7 days | 90 days |
Prometheus collector | Agent deployed in user cluster (default single replica occupies cluster resources of 3 cores, 4 GB), requiring self-management. | Provides managed collection Agent, users no longer bear the resource cost of Agent, provides production-level SLA of 99.95%. |
Monitoring dashboard | Built-in basic monitoring dashboards. | Built-in comprehensive monitoring dashboards. |
Container monitoring Pro version supported dashboards
Type | Dashboard name |
Monitoring overview | Cluster monitoring overview |
Cluster namespace dashboard | |
Cluster core components | ACK Pro API server |
ACK Pro ETCD | |
ACK Pro Scheduler | |
ACK Pro Cloud Controller Manager | |
ACK Pro Kube Controller Manager | |
Node monitoring | Node pool overview |
Cluster node monitoring details | |
Application monitoring | Stateless application monitoring |
Stateful application monitoring | |
Daemon process set application monitoring | |
Cluster Pod monitoring | |
Network monitoring | CoreDNS component monitoring |
Cluster Ingress traffic monitoring | |
Storage monitoring | CSI storage component monitoring-cluster dimension |
CSI storage component monitoring-node dimension | |
Pod IO Monitoring (Pod Level) | |
Frontend Storage IO Monitoring (Cluster Level) | |
GPU monitoring | Cluster GPU monitoring-cluster dimension |
Cluster GPU monitoring-node dimension | |
Cluster GPU monitoring-application Pod dimension | |
Cost analysis/resource optimization | Resource profile |
Others | Backend Storage IO Monitoring (Cluster Level) |
k8s-reclaimed-resource | |
Cluster Prometheus self-monitoring | |
Virtual Node(ECI) Overview |
Default alert rules
Alert rule name/ID | Alert group | Template |
Node CPU usage greater than 75% | Node | Node {{ $labels.instance }} CPU usage greater than 75%, current CPU usage {{ printf "%.2f" $value }}% |
Node CPU usage greater than 85% | Node | Node {{ $labels.instance }} CPU usage greater than 85%, current CPU usage {{ printf "%.2f" $value }}% |
Node memory usage greater than 75% | Node | Node {{ $labels.instance }} memory usage greater than 75%, current memory usage {{ printf "%.2f" $value }}% |
Node memory usage greater than 85% | Node | Node {{ $labels.instance }} memory usage greater than 85%, current memory usage {{ printf "%.2f" $value }}% |
Node anomalies | Node | Node {{$labels.node}} has been in unavailable status for more than 10 minutes |
Disk usage greater than 95% | Node | Node {{ $labels.instance }} disk {{ $labels.device }} usage exceeds 95%, current disk usage {{ printf "%.2f" $value }}% |
Deployment Pod availability less than 50% | Workload | Namespace: {{$labels.namespace}} / Deployment: {{$labels.deployment}} Pod availability less than 50%, current unavailable Pod count {{ $value }} |
Job execution failed | Workload | Namespace: {{$labels.namespace}}/Job: {{$labels.job_name}} execution failed |
Pod startup timeout failure | Workload | Namespace: {{$labels.namespace}}/Pod: {{$labels.pod_name}} has not started successfully for more than 15 minutes, waiting reason {{$labels.reason}} |
Pod status abnormal | Workload | Namespace: {{$labels.namespace}}/Pod: {{$labels.pod_name}} has been in {{$labels.phase}} status for more than 10 minutes |
Pod frequent restart | Workload | Namespace: {{$labels.namespace}}/Pod: {{$labels.pod_name}} restarted more than {{ $labels.metrics_params_value}} times within {{$labels.metrics_params_time}} minutes, current restart count {{ $value }} |
Container CPU usage exceeds 85% | Workload | Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} CPU usage greater than 85%, current value {{ printf "%.2f" $value }}% |
Container CPU usage exceeds 75% | Workload | Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} CPU usage greater than 75%, current value {{ printf "%.2f" $value }}% |
Container memory usage exceeds 75% | Workload | Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} memory usage greater than 75%, current value {{ printf "%.2f" $value }}% |
Container memory usage exceeds 85% | Workload | Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} memory usage greater than 85%, current value {{ printf "%.2f" $value }}% |