All Products
Search
Document Center

Managed Service for Prometheus:Using container monitoring Pro version

Last Updated:Jun 18, 2025

Managed Service for Prometheus provides Prometheus managed service and container monitoring service. Container monitoring service incurs container monitoring fees, which include monitoring cluster scale fees and Prometheus instance fees. Container monitoring service is available in two versions: basic and Pro. This topic describes the operations, billing details, features, supported dashboards, and default alert rules of container monitoring Pro version.

Limits

Only ACK Pro clusters support the activation of Container Monitoring Pro Edition. By default, other types of container clusters are limited to using Container Monitoring Basic Edition.

Prerequisites

Container monitoring Pro version requires Managed Service for Prometheus. You need to first activate Managed Service for Prometheus (pay-as-you-go by write volume activation link, pay-as-you-go by reporting volume activation link), and then activate container monitoring Pro version.

Container monitoring Pro version billing details

Billing item

Billing description

Billable methods

Billing cycle

Monitoring cluster scale fee

OCU usage is calculated based on the container cluster node scale, with every 10 cluster nodes converted to 1 OCU.

Note

OCU: Observability Capacity Unit is a new billing unit introduced by Alibaba Cloud Native Observability. It automatically calculates OCU usage based on hourly resource usage. The price of OCU is 0.023 USD/unit.

Pay-as-you-go: Daily container cluster scale fee = Sum of hourly OCU units × OCU unit price

Note

Hourly OCU units = Maximum number of nodes in the current billing cycle divided by 10, rounded up

The billing cycle is hourly. Managed Service for Prometheus calculates the maximum number of cluster nodes for each hour of the previous day after 00:00, then calculates the hourly OCU units according to billing rules, accumulates the hourly OCU units to calculate the total OCU amount for the previous day, multiplies by the OCU unit price, and generates the container cluster scale monitoring fee on a daily basis.

Prometheus instance fee

For more information, see Prometheus instance billing.

How to use container monitoring Pro version

Method 1: Select container monitoring Pro version during integration

  1. On the Integration Center page, select Container Cluster Monitoring.

  2. In the Container Cluster Monitoring panel, select the container service cluster to be integrated, then select Container Monitoring Pro Version as the version, and click OK.63

Method 2: Upgrade from basic version to container monitoring Pro version

Important

After upgrading to container monitoring Pro version, you cannot downgrade to the basic version.

  1. On the Provisioning page, select Integrated Environments > Container Environment.

  2. Click Upgrade in the Operation column of the container monitoring to be upgraded. In the dialog box, click Confirm.62e

Differences between basic version and Pro version

Category

Basic version

Pro version

Container cluster basic metrics storage period

7 days

90 days

Prometheus collector

Agent deployed in user cluster (default single replica occupies cluster resources of 3 cores, 4 GB), requiring self-management.

Provides managed collection Agent, users no longer bear the resource cost of Agent, provides production-level SLA of 99.95%.

Monitoring dashboard

Built-in basic monitoring dashboards.

Built-in comprehensive monitoring dashboards.

Container monitoring Pro version supported dashboards

Type

Dashboard name

Monitoring overview

Cluster monitoring overview

Cluster namespace dashboard

Cluster core components

ACK Pro API server

ACK Pro ETCD

ACK Pro Scheduler

ACK Pro Cloud Controller Manager

ACK Pro Kube Controller Manager

Node monitoring

Node pool overview

Cluster node monitoring details

Application monitoring

Stateless application monitoring

Stateful application monitoring

Daemon process set application monitoring

Cluster Pod monitoring

Network monitoring

CoreDNS component monitoring

Cluster Ingress traffic monitoring

Storage monitoring

CSI storage component monitoring-cluster dimension

CSI storage component monitoring-node dimension

Pod IO Monitoring (Pod Level)

Frontend Storage IO Monitoring (Cluster Level)

GPU monitoring

Cluster GPU monitoring-cluster dimension

Cluster GPU monitoring-node dimension

Cluster GPU monitoring-application Pod dimension

Cost analysis/resource optimization

Resource profile

Others

Backend Storage IO Monitoring (Cluster Level)

k8s-reclaimed-resource

Cluster Prometheus self-monitoring

Virtual Node(ECI) Overview

Default alert rules

Alert rule name/ID

Alert group

Template

Node CPU usage greater than 75%

Node

Node {{ $labels.instance }} CPU usage greater than 75%, current CPU usage {{ printf "%.2f" $value }}%

Node CPU usage greater than 85%

Node

Node {{ $labels.instance }} CPU usage greater than 85%, current CPU usage {{ printf "%.2f" $value }}%

Node memory usage greater than 75%

Node

Node {{ $labels.instance }} memory usage greater than 75%, current memory usage {{ printf "%.2f" $value }}%

Node memory usage greater than 85%

Node

Node {{ $labels.instance }} memory usage greater than 85%, current memory usage {{ printf "%.2f" $value }}%

Node anomalies

Node

Node {{$labels.node}} has been in unavailable status for more than 10 minutes

Disk usage greater than 95%

Node

Node {{ $labels.instance }} disk {{ $labels.device }} usage exceeds 95%, current disk usage {{ printf "%.2f" $value }}%

Deployment Pod availability less than 50%

Workload

Namespace: {{$labels.namespace}} / Deployment: {{$labels.deployment}} Pod availability less than 50%, current unavailable Pod count {{ $value }}

Job execution failed

Workload

Namespace: {{$labels.namespace}}/Job: {{$labels.job_name}} execution failed

Pod startup timeout failure

Workload

Namespace: {{$labels.namespace}}/Pod: {{$labels.pod_name}} has not started successfully for more than 15 minutes, waiting reason {{$labels.reason}}

Pod status abnormal

Workload

Namespace: {{$labels.namespace}}/Pod: {{$labels.pod_name}} has been in {{$labels.phase}} status for more than 10 minutes

Pod frequent restart

Workload

Namespace: {{$labels.namespace}}/Pod: {{$labels.pod_name}} restarted more than {{ $labels.metrics_params_value}} times within {{$labels.metrics_params_time}} minutes, current restart count {{ $value }}

Container CPU usage exceeds 85%

Workload

Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} CPU usage greater than 85%, current value {{ printf "%.2f" $value }}%

Container CPU usage exceeds 75%

Workload

Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} CPU usage greater than 75%, current value {{ printf "%.2f" $value }}%

Container memory usage exceeds 75%

Workload

Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} memory usage greater than 75%, current value {{ printf "%.2f" $value }}%

Container memory usage exceeds 85%

Workload

Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} memory usage greater than 85%, current value {{ printf "%.2f" $value }}%