Use Container Monitoring Pro Edition

Managed Service for Prometheus provides Prometheus managed service and container monitoring service. Container Monitoring incurs container monitoring fees, which include monitoring cluster scale fees and Prometheus instance fees. Container Monitoring is available in two editions: Basic and Pro. This topic describes the operations, billing details, features, supported dashboards, and default alert rules of Container Monitoring Pro Edition.

Cluster types that support Pro Edition

ACK managed Pro cluster
ACK Lingjun cluster
ACK dedicated cluster

Prerequisites

Container Monitoring Pro Edition requires Managed Service for Prometheus. You need to first activate Managed Service for Prometheus (pay-as-you-go by ingestion data volume or by ingestion metric sample count), and then activate Container Monitoring Pro Edition.

Billing

Billing item

Description

Billable methods

Cycle

Monitoring cluster scale fee

Observability Capacity Unit (OCU) usage is calculated based on the container cluster node scale, with every 10 cluster nodes converted to 1 OCU.

Note

OCU: a new billing unit introduced by Alibaba Cloud Native Observability. It automatically calculates OCU usage based on hourly resource usage. Unit price: 0.023 USD.

Pay-as-you-go: Daily container cluster scale fee = Sum of hourly OCU units × OCU unit price

Note

Hourly OCU units = Maximum number of nodes in the current billing cycle divided by 10, rounded up

Hourly. Managed Service for Prometheus calculates the maximum number of cluster nodes for each hour of the previous day after 00:00, then calculates the hourly OCUs according to billing rules, accumulates the hourly OCUs to calculate the total OCU amount for the previous day, multiplies by the OCU unit price, and generates the container cluster scale monitoring fee on a daily basis.

Prometheus instance fee

For more information, see Prometheus instance billing.

Method 1: Select this edition during integration

On the Integration Center page, select Kubernetes Cluster Monitoring.
In the Kubernetes Cluster Monitoring panel, select the container service cluster to be integrated, select Container Monitoring Pro Edition, and click OK.

Method 2: Upgrade from Basic Edition to Pro Edition

Important

After upgrading to Container Monitoring Pro Edition, you cannot downgrade to Basic Edition.

On the Integration Management page, choose Integrated Environments > Container Service.
Click Upgrade to Pro Edition in the Actions column. In the dialog box, click OK.

Differences between Basic Edition and Pro Edition

Category	Basic Edition	Pro Edition
Storage period for basic metrics of container service clusters	7 days	90 days
Prometheus collector	Agent deployed in user clusters (by default, a single replica occupies cluster resources of 3 cores, 4 GB), requiring self-management.	Provides managed agent. You no longer bear the resource cost of agents, provides production-level SLA of 99.95%.
Dashboard	Built-in basic monitoring dashboards.	Built-in comprehensive monitoring dashboards.

Supported dashboards

Type	Dashboard
Monitoring overview	Cluster monitoring overview
Monitoring overview	Cluster namespace dashboard
Cluster core components	ACK Pro API server
	ACK Pro ETCD
	ACK Pro Scheduler
	ACK Pro Cloud Controller Manager
	ACK Pro Kube Controller Manager
Node monitoring	Node pool overview
Node monitoring	Cluster node monitoring details
Application monitoring	StatefulSet monitoring
	Deployment monitoring
	Daemon process set application monitoring
	Cluster Pod monitoring
Network monitoring	CoreDNS component monitoring
Network monitoring	Cluster Ingress traffic monitoring
Storage monitoring	CSI storage component monitoring-cluster level
	CSI storage component monitoring-node level
	Pod IO Monitoring (Pod Level)
	Frontend Storage IO Monitoring (Cluster Level)
GPU monitoring	Cluster GPU monitoring-cluster level
	Cluster GPU monitoring-node level
	Cluster GPU monitoring-application Pod dimension
Cost analysis/Resource optimization	Resource profile
Others	Backend Storage IO Monitoring (Cluster Level)
	k8s-reclaimed-resource
	Cluster Prometheus self-monitoring
	Virtual Node(ECI) Overview

Default alert rules

Alert rule name/ID	Alert group	Template
Node CPU usage greater than 75%	Node	Node {{ $labels.instance }} CPU usage greater than 75%, current CPU usage {{ printf "%.2f" $value }}%
Node CPU usage greater than 85%	Node	Node {{ $labels.instance }} CPU usage greater than 85%, current CPU usage {{ printf "%.2f" $value }}%
Node memory usage greater than 75%	Node	Node {{ $labels.instance }} memory usage greater than 75%, current memory usage {{ printf "%.2f" $value }}%
Node memory usage greater than 85%	Node	Node {{ $labels.instance }} memory usage greater than 85%, current memory usage {{ printf "%.2f" $value }}%
Node anomalies	Node	Node {{$labels.node}} has been in unavailable status for more than 10 minutes
Disk usage greater than 95%	Node	Node {{ $labels.instance }} disk {{ $labels.device }} usage exceeds 95%, current disk usage {{ printf "%.2f" $value }}%
Deployment Pod availability less than 50%	Workload	Namespace: {{$labels.namespace}} / Deployment: {{$labels.deployment}} Pod availability less than 50%, current unavailable Pod count {{ $value }}
Job execution failed	Workload	Namespace: {{$labels.namespace}}/Job: {{$labels.job_name}} execution failed
Pod startup timeout failure	Workload	Namespace: {{$labels.namespace}}/Pod: {{$labels.pod_name}} has not started successfully for more than 15 minutes, waiting reason {{$labels.reason}}
Pod status abnormal	Workload	Namespace: {{$labels.namespace}}/Pod: {{$labels.pod_name}} has been in {{$labels.phase}} status for more than 10 minutes
Pod frequent restart	Workload	Namespace: {{$labels.namespace}}/Pod: {{$labels.pod_name}} restarted more than {{ $labels.metrics_params_value}} times within {{$labels.metrics_params_time}} minutes, current restart count {{ $value }}
Container CPU usage exceeds 85%	Workload	Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} CPU usage greater than 85%, current value {{ printf "%.2f" $value }}%
Container CPU usage exceeds 75%	Workload	Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} CPU usage greater than 75%, current value {{ printf "%.2f" $value }}%
Container memory usage exceeds 75%	Workload	Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} memory usage greater than 75%, current value {{ printf "%.2f" $value }}%
Container memory usage exceeds 85%	Workload	Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} memory usage greater than 85%, current value {{ printf "%.2f" $value }}%