Container Service for Kubernetes:Connect to and configure Managed Service for Prometheus - Container Service for Kubernetes

Choose an edition

Managed Service for Prometheus fully integrates with the open-source Prometheus ecosystem and provides a fully managed monitoring service, handling data storage, visualization, and O&M automatically.

	Pro Edition (recommended)	Basic Edition
Metric retention	90 days	7 days
Collector	Fully managed	Self-managed
SLA	99.95%	Not specified
Grafana dashboards	Customizable	Pre-configured only
Pre-configured alert rules	Container Service components included	—
Monitoring fee	Pay-as-you-go, by node count	Free

Choose Pro Edition for production clusters that require 99.95% Service-Level Agreement (SLA) guarantees, 90-day metric retention, or customizable Grafana dashboards. Choose Basic Edition for development clusters or cost-sensitive workloads where 7-day retention is sufficient.

To upgrade from Basic Edition to Pro Edition after enablement, see Upgrade Alibaba Cloud Prometheus Monitoring from Basic Edition to Pro Edition.

Prerequisites

Before you begin, make sure you have:

An ACK managed cluster, ACK dedicated cluster, ACK Serverless cluster, or ACS cluster
(ACK dedicated clusters only) Monitoring policy authorization configured — see Grant monitoring permissions for an ACK dedicated cluster

Enable Prometheus monitoring

The steps differ slightly depending on whether you are enabling monitoring on an existing cluster or at cluster creation. The configuration options are the same for both paths.

After monitoring is enabled, default basic metrics are collected automatically. Preset dashboards — Cluster Overview, Node Monitoring, Application Monitoring, Network Monitoring, and Storage Monitoring — are available on the Prometheus Monitoring page. To collect custom metrics, see Collect custom metrics.

Enable monitoring for an existing cluster

On the ClustersClustersClustersClustersClusters page, click the name of the target cluster. In the left navigation pane of the cluster details page, choose Operations > Prometheus Monitoring.Clusters
On the Prometheus Monitoring page, select a container monitoring version and click Install.

Enable monitoring when creating a cluster

The option location varies by cluster type:

ACK managed cluster Pro Edition — On the Component Configuration page, in the Container Monitoring section, select Container Cluster Monitoring Pro Edition or Container Cluster Monitoring Basic Edition. For more information, see Create an ACK managed cluster.

Auto Mode for smart hosting defaults to Container Monitoring Basic Edition.
ACK managed cluster Basic Edition, ACS clusters, and ACK Serverless clusters — On the Component Configurations page, in the Monitor containers section, select Enable Managed Service for Prometheus to install Container Monitoring Basic Edition.

Configure alert notifications

Set up alert rules for key metrics so that notifications are sent automatically via email, SMS, or DingTalk when anomalies occur.

Log on to the ARMS consoleARMS console. In the left navigation pane, choose Alert Management > Notification Objects.
On the Notification Objects page, select a notification method and create an alert notification recipient.
In the left navigation pane, choose Managed Service for Prometheus > Prometheus Alert Rules.
On the Prometheus Alert Rules page, click Create Prometheus Alert Rule.

For full configuration details, see Configure Prometheus alerting rules.

Collect custom metrics

Prometheus monitoring supports collecting custom metrics such as request QPS (Queries Per Second) and processing latency. For configuration details, see Manage custom collection rules for container environments.

Disable Prometheus monitoring

Important

Disabling monitoring removes the monitoring component from the cluster but does not delete residual Kubernetes resources (ClusterRoles, ClusterRoleBindings, and the arms-prom namespace). If reinstallation fails afterward, manually delete all ARMS-Prometheus resources.

On the cluster details page, in the left navigation pane, click Add-ons.
On the Add-ons page, click the Logs and Monitoring tab. Find the ack-arms-prometheus component and click Uninstall. In the dialog box, click OK.

Billing

Monitoring fees — Basic Edition is free. Pro Edition is billed on a pay-as-you-go basis by node count.
Prometheus instance fees — Basic metrics collection is free. Custom metrics are billed on a pay-as-you-go basis based on data writes, data reports, storage volume, and retention period.

For pricing details, see Container Monitoring Billing.

Default basic metrics

After Prometheus monitoring is enabled, the following metrics are collected automatically. For descriptions of each metric, see Metric descriptions.

Basic resource monitoring for containers (kubelet)
Application state monitoring for clusters (kube-state-metrics)
Basic resource monitoring for nodes (node-exporter)
GPU monitoring for nodes (ack-gpu-exporter)
Control plane component monitoring for managed clusters — covers API Server, etcd, kube-scheduler, kube-controller-manager, and cloud-controller-manager
Basic monitoring for CoreDNS
Basic monitoring for Ingress Controller

The following metrics are reported automatically when specific features are enabled:

Container Storage Monitoring Overview — reports metrics for the csi-plugin component
Cost Insight — reports metrics for the ack-cost-exporter component
Colocation of multi-types workloads monitoring and resource profile — reports metrics for the ack-koordinator component

FAQ

The Prometheus Monitoring page shows "No related monitoring dashboard found"

This typically means the Prometheus instance lost its connection to the cluster after installation. Reinstall the component first, then check the agent connection if the issue persists.

Reinstall the Prometheus monitoring component.
1. Disable Prometheus monitoring.
2. After confirming uninstallation is complete, click Install, then click OK in the dialog box.
3. After installation completes, return to the Prometheus Monitoring page to check whether the issue is resolved. If the issue persists, continue to the next step.
Check the Prometheus instance connection.
1. In the ARMS consoleARMS console left navigation pane, click Integration Management.
2. On the Integrated Environments tab, check the Container Service list for a container environment with the same name as your cluster.
  - No matching environment found — See Connect using the ARMS or Prometheus console.
  - Matching environment found — Click Configure Agent in the Actions column to open the Configure Agent page and verify that the installed agents are running as expected.

How do I adjust the metric storage duration?

See Adjust metric storage duration.

How do I view the version of the ack-arms-prometheus component?

On the ClustersClustersClustersClustersClusters page, click the name of the target cluster. In the left navigation pane, click Add-ons.
On the Add-ons page, click the Logs and Monitoring tab and find the ack-arms-prometheus component. The current version is displayed below the component name. If a newer version is available, click Upgrade next to the version number.

The Upgrade option appears only if the installed version is not the latest.

Why can't I deploy GPU monitoring?

GPU monitoring may fail to deploy if a GPU node has taints. Run the following command to check for taints on the node.

kubectl describe node cn-beijing.47.100.***.***

If the node has custom taints, the output includes entries for them. For example, a taint with key test-key, value test-value, and effect NoSchedule appears as:

Taints: test-key=test-value:NoSchedule

To resolve the issue, use one of the following approaches:

Remove the taint from the GPU node:

kubectl taint node cn-beijing.47.100.***.*** test-key=test-value:NoSchedule-

Add a toleration to the ack-prometheus-gpu-exporter DaemonSet so pods can be scheduled to the node:

# Edit the ack-prometheus-gpu-exporter DaemonSet
kubectl edit daemonset -n arms-prom ack-prometheus-gpu-exporter

Add the following tolerations field at the same level as containers:

# Add above the containers field, at the same indentation level
tolerations:
- key: "test-key"
  operator: "Equal"
  value: "test-value"
  effect: "NoSchedule"
containers:
 # Other fields omitted

How do I completely and manually delete ARMS-Prometheus?

Deleting only the arms-prom namespace leaves residual configurations that can cause reinstallation to fail. To fully remove all ARMS-Prometheus resources, run the following commands.

Delete the arms-prom namespace:
```
kubectl delete namespace arms-prom
```

Delete ClusterRoles:

kubectl delete ClusterRole arms-kube-state-metrics
kubectl delete ClusterRole arms-node-exporter
kubectl delete ClusterRole arms-prom-ack-arms-prometheus-role
kubectl delete ClusterRole arms-prometheus-oper3
kubectl delete ClusterRole arms-prometheus-ack-arms-prometheus-role
kubectl delete ClusterRole arms-pilot-prom-k8s
kubectl delete ClusterRole gpu-prometheus-exporter
kubectl delete ClusterRole o11y:addon-controller:role
kubectl delete ClusterRole arms-aliyunserviceroleforarms-clusterrole

Delete ClusterRoleBindings:

kubectl delete ClusterRoleBinding arms-node-exporter
kubectl delete ClusterRoleBinding arms-prom-ack-arms-prometheus-role-binding
kubectl delete ClusterRoleBinding arms-prometheus-oper-bind2
kubectl delete ClusterRoleBinding arms-kube-state-metrics
kubectl delete ClusterRoleBinding arms-pilot-prom-k8s
kubectl delete ClusterRoleBinding arms-prometheus-ack-arms-prometheus-role-binding
kubectl delete ClusterRoleBinding gpu-prometheus-exporter
kubectl delete ClusterRoleBinding o11y:addon-controller:rolebinding
kubectl delete ClusterRoleBinding arms-kube-state-metrics-agent
kubectl delete ClusterRoleBinding arms-node-exporter-agent
kubectl delete ClusterRoleBinding arms-aliyunserviceroleforarms-clusterrolebinding

Delete Roles and RoleBindings:

kubectl delete Role arms-pilot-prom-spec-ns-k8s
kubectl delete Role arms-pilot-prom-spec-ns-k8s -n kube-system
kubectl delete RoleBinding arms-pilot-prom-spec-ns-k8s
kubectl delete RoleBinding arms-pilot-prom-spec-ns-k8s -n kube-system

How do I uninstall Managed Service for Prometheus using Helm?

Use this method if you deployed the service manually with Helm, or if residual resources remain due to environment or Helm version issues.

On the ClustersClustersClustersClustersClusters page, click the name of the target cluster. In the left navigation pane, choose Applications > Helm.
On the Helm page, find the arms-prometheus component, click Delete in the Actions column, select Clear Release Records, and follow the prompts.

An "xxx in use" error occurs when installing ack-arms-prometheus

On the ClustersClustersClustersClustersClusters page, click the name of the target cluster. In the left navigation pane, choose Applications > Helm.
On the Helm page, check whether ack-arms-prometheus exists.
- Found — Delete ack-arms-prometheus from the Helm page, then reinstall it on the Add-ons page. For details, see Manage components.
- Not found — Residual resources remain from a previous deletion of the ack-arms-prometheus Helm release. Manually delete all ARMS-Prometheus resources, then re-enable Prometheus monitoring.

Installation of ack-arms-prometheus fails after a "Component Not Installed" message

Check each of the following in order:

Verify whether ack-arms-prometheus is already installed.
1. On the ClustersClustersClustersClustersClusters page, click the name of the target cluster. In the left navigation pane, choose Applications > Helm.
2. On the Helm page, check whether ack-arms-prometheus exists.
  - Found — Delete ack-arms-prometheus from the Helm page, then reinstall it on the Add-ons page. For details, see Manage components.
  - Not found — Manually delete all ARMS-Prometheus resources, then re-enable Prometheus monitoring.
Check ack-arms-prometheus logs for errors.
1. In the cluster details left navigation pane, choose Workloads > Deployments.
2. At the top of the Deployments page, set Namespace to arms-prom and click arms-prometheus-ack-arms-prometheus.
3. Click the Logs tab and check for errors.
Check whether an error occurred during agent installation.
1. Log on to the ARMS consoleARMS console. In the left navigation pane, click Integration Management.
2. On the Integration Management tab, find the target container environment in the Container Service list. In the Actions column, click Configure Agent to open the Configure Agent page.

Grant monitoring permissions for an ACK dedicated cluster

ACK dedicated clusters require explicit monitoring policy authorization before enabling Prometheus monitoring. Follow these steps to grant the required permissions.

On the ClustersClustersClustersClustersClusters page, click the name of the target cluster. In the left navigation pane, click Cluster Information.
On the Basic Information tab, click the KubernetesWorkerRole-*** link next to Worker RAM Role. On the RAM role page, click the Permissions tab. In the Policy column, click k8sWorkerRole****.
On the access policy details page, click the Policy Document tab, then click Edit Policy Document.

In the JSON editor, add the following authorization rule to the Statement field and click OK.

{
    "Version": "1",
    "Statement": [
        {
            "Action": [
                "arms:Describe*",
                "arms:List*",
                "arms:Get*",
                "arms:Search*",
                "arms:Check*",
                "arms:Query*",
                "arms:ListEnvironments",
                "arms:DescribeAddonRelease",
                "arms:InstallAddon",
                "arms:DeleteAddonRelease",
                "arms:ListEnvironmentDashboards",
                "arms:ListAddonReleases",
                "arms:CreateEnvironment",
                "arms:UpdateEnvironment",
                "arms:InitEnvironment",
                "arms:DescribeEnvironment",
                "arms:InstallEnvironmentFeature",
                "arms:ListEnvironmentFeatures",
                "cms:CreateIntegrationPolicy",
                "cms:ListAddonReleases",
                "cms:UpdateAddonRelease",
                "cms:CreateAddonRelease",
                "cms:GetPrometheusInstance",
                "cms:ListIntegrationPolicyStorageRequirements"
            ],
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}