Managed Service for Prometheus provides Prometheus managed service and container monitoring service. Container Monitoring incurs container monitoring fees, which include monitoring cluster scale fees and Prometheus instance fees. Container Monitoring is available in two editions: Basic and Pro. This topic describes the operations, billing details, features, supported dashboards, and default alert rules of Container Monitoring Pro Edition.
Cluster types that support Pro Edition
ACK managed Pro cluster
ACK Lingjun cluster
ACK dedicated cluster
Prerequisites
Container Monitoring Pro Edition requires Managed Service for Prometheus. You need to first activate Managed Service for Prometheus (pay-as-you-go by ingestion data volume or by ingestion metric sample count), and then activate Container Monitoring Pro Edition.
Billing
Billing item | Description | Billable methods | Cycle |
Monitoring cluster scale fee | Observability Capacity Unit (OCU) usage is calculated based on the container cluster node scale, with every 10 cluster nodes converted to 1 OCU. Note OCU: a new billing unit introduced by Alibaba Cloud Native Observability. It automatically calculates OCU usage based on hourly resource usage. Unit price: 0.023 USD. | Pay-as-you-go: Daily container cluster scale fee = Sum of hourly OCU units × OCU unit price Note Hourly OCU units = Maximum number of nodes in the current billing cycle divided by 10, rounded up | Hourly. Managed Service for Prometheus calculates the maximum number of cluster nodes for each hour of the previous day after 00:00, then calculates the hourly OCUs according to billing rules, accumulates the hourly OCUs to calculate the total OCU amount for the previous day, multiplies by the OCU unit price, and generates the container cluster scale monitoring fee on a daily basis. |
Prometheus instance fee | For more information, see Prometheus instance billing. | ||
Use Container Monitoring Pro Edition
Method 1: Select this edition during integration
On the Integration Center page, select Kubernetes Cluster Monitoring.
In the Kubernetes Cluster Monitoring panel, select the container service cluster to be integrated, select Container Monitoring Pro Edition, and click OK.

Method 2: Upgrade from Basic Edition to Pro Edition
After upgrading to Container Monitoring Pro Edition, you cannot downgrade to Basic Edition.
On the Integration Management page, choose Integrated Environments > Container Service.
Click Upgrade to Pro Edition in the Actions column. In the dialog box, click OK.

Differences between Basic Edition and Pro Edition
Category | Basic Edition | Pro Edition |
Storage period for basic metrics of container service clusters | 7 days | 90 days |
Prometheus collector | Agent deployed in user clusters (by default, a single replica occupies cluster resources of 3 cores, 4 GB), requiring self-management. | Provides managed agent. You no longer bear the resource cost of agents, provides production-level SLA of 99.95%. |
Dashboard | Built-in basic monitoring dashboards. | Built-in comprehensive monitoring dashboards. |
Supported dashboards
Type | Dashboard |
Monitoring overview | Cluster monitoring overview |
Cluster namespace dashboard | |
Cluster core components | ACK Pro API server |
ACK Pro ETCD | |
ACK Pro Scheduler | |
ACK Pro Cloud Controller Manager | |
ACK Pro Kube Controller Manager | |
Node monitoring | Node pool overview |
Cluster node monitoring details | |
Application monitoring | StatefulSet monitoring |
Deployment monitoring | |
Daemon process set application monitoring | |
Cluster Pod monitoring | |
Network monitoring | CoreDNS component monitoring |
Cluster Ingress traffic monitoring | |
Storage monitoring | CSI storage component monitoring-cluster level |
CSI storage component monitoring-node level | |
Pod IO Monitoring (Pod Level) | |
Frontend Storage IO Monitoring (Cluster Level) | |
GPU monitoring | Cluster GPU monitoring-cluster level |
Cluster GPU monitoring-node level | |
Cluster GPU monitoring-application Pod dimension | |
Cost analysis/Resource optimization | Resource profile |
Others | Backend Storage IO Monitoring (Cluster Level) |
k8s-reclaimed-resource | |
Cluster Prometheus self-monitoring | |
Virtual Node(ECI) Overview |
Default alert rules
Alert rule name/ID | Alert group | Template |
Node CPU usage greater than 75% | Node | Node {{ $labels.instance }} CPU usage greater than 75%, current CPU usage {{ printf "%.2f" $value }}% |
Node CPU usage greater than 85% | Node | Node {{ $labels.instance }} CPU usage greater than 85%, current CPU usage {{ printf "%.2f" $value }}% |
Node memory usage greater than 75% | Node | Node {{ $labels.instance }} memory usage greater than 75%, current memory usage {{ printf "%.2f" $value }}% |
Node memory usage greater than 85% | Node | Node {{ $labels.instance }} memory usage greater than 85%, current memory usage {{ printf "%.2f" $value }}% |
Node anomalies | Node | Node {{$labels.node}} has been in unavailable status for more than 10 minutes |
Disk usage greater than 95% | Node | Node {{ $labels.instance }} disk {{ $labels.device }} usage exceeds 95%, current disk usage {{ printf "%.2f" $value }}% |
Deployment Pod availability less than 50% | Workload | Namespace: {{$labels.namespace}} / Deployment: {{$labels.deployment}} Pod availability less than 50%, current unavailable Pod count {{ $value }} |
Job execution failed | Workload | Namespace: {{$labels.namespace}}/Job: {{$labels.job_name}} execution failed |
Pod startup timeout failure | Workload | Namespace: {{$labels.namespace}}/Pod: {{$labels.pod_name}} has not started successfully for more than 15 minutes, waiting reason {{$labels.reason}} |
Pod status abnormal | Workload | Namespace: {{$labels.namespace}}/Pod: {{$labels.pod_name}} has been in {{$labels.phase}} status for more than 10 minutes |
Pod frequent restart | Workload | Namespace: {{$labels.namespace}}/Pod: {{$labels.pod_name}} restarted more than {{ $labels.metrics_params_value}} times within {{$labels.metrics_params_time}} minutes, current restart count {{ $value }} |
Container CPU usage exceeds 85% | Workload | Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} CPU usage greater than 85%, current value {{ printf "%.2f" $value }}% |
Container CPU usage exceeds 75% | Workload | Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} CPU usage greater than 75%, current value {{ printf "%.2f" $value }}% |
Container memory usage exceeds 75% | Workload | Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} memory usage greater than 75%, current value {{ printf "%.2f" $value }}% |
Container memory usage exceeds 85% | Workload | Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} memory usage greater than 85%, current value {{ printf "%.2f" $value }}% |