Integrate Prometheus Monitoring to collect metrics for the control plane, nodes, and applications in your ACK cluster, and improve cluster performance management with visual dashboards and real-time alerts.
Choose an edition
Managed Service for Prometheus fully integrates with the open-source Prometheus ecosystem and provides a fully managed monitoring service, handling data storage, visualization, and O&M automatically.
| Pro Edition (recommended) | Basic Edition | |
|---|---|---|
| Metric retention | 90 days | 7 days |
| Collector | Fully managed | Self-managed |
| SLA | 99.95% | Not specified |
| Grafana dashboards | Customizable | Pre-configured only |
| Pre-configured alert rules | Container Service components included | — |
| Monitoring fee | Pay-as-you-go, by node count | Free |
Choose Pro Edition for production clusters that require 99.95% Service-Level Agreement (SLA) guarantees, 90-day metric retention, or customizable Grafana dashboards. Choose Basic Edition for development clusters or cost-sensitive workloads where 7-day retention is sufficient.
To upgrade from Basic Edition to Pro Edition after enablement, see Upgrade Alibaba Cloud Prometheus Monitoring from Basic Edition to Pro Edition.
Prerequisites
Before you begin, make sure you have:
-
An ACK managed cluster, ACK dedicated cluster, ACK Serverless cluster, or ACS cluster
-
(ACK dedicated clusters only) Monitoring policy authorization configured — see Grant monitoring permissions for an ACK dedicated cluster
Enable Prometheus monitoring
The steps differ slightly depending on whether you are enabling monitoring on an existing cluster or at cluster creation. The configuration options are the same for both paths.
After monitoring is enabled, default basic metrics are collected automatically. Preset dashboards — Cluster Overview, Node Monitoring, Application Monitoring, Network Monitoring, and Storage Monitoring — are available on the Prometheus Monitoring page. To collect custom metrics, see Collect custom metrics.
Enable monitoring for an existing cluster
Enable monitoring when creating a cluster
The option location varies by cluster type:
-
ACK managed cluster Pro Edition — On the Component Configuration page, in the Container Monitoring section, select Container Cluster Monitoring Pro Edition or Container Cluster Monitoring Basic Edition. For more information, see Create an ACK managed cluster.
Auto Mode for smart hosting defaults to Container Monitoring Basic Edition.
-
ACK managed cluster Basic Edition, ACS clusters, and ACK Serverless clusters — On the Component Configurations page, in the Monitor containers section, select Enable Managed Service for Prometheus to install Container Monitoring Basic Edition.
Configure alert notifications
Set up alert rules for key metrics so that notifications are sent automatically via email, SMS, or DingTalk when anomalies occur.
-
Log on to the ARMS consoleARMS console. In the left navigation pane, choose Alert Management > Notification Objects.
-
On the Notification Objects page, select a notification method and create an alert notification recipient.
-
In the left navigation pane, choose Managed Service for Prometheus > Prometheus Alert Rules.
-
On the Prometheus Alert Rules page, click Create Prometheus Alert Rule.
For full configuration details, see Configure Prometheus alerting rules.
Collect custom metrics
Prometheus monitoring supports collecting custom metrics such as request QPS (Queries Per Second) and processing latency. For configuration details, see Manage custom collection rules for container environments.
Disable Prometheus monitoring
Disabling monitoring removes the monitoring component from the cluster but does not delete residual Kubernetes resources (ClusterRoles, ClusterRoleBindings, and the arms-prom namespace). If reinstallation fails afterward, manually delete all ARMS-Prometheus resources.
-
On the cluster details page, in the left navigation pane, click Add-ons.
-
On the Add-ons page, click the Logs and Monitoring tab. Find the ack-arms-prometheus component and click Uninstall. In the dialog box, click OK.
Billing
-
Monitoring fees — Basic Edition is free. Pro Edition is billed on a pay-as-you-go basis by node count.
-
Prometheus instance fees — Basic metrics collection is free. Custom metrics are billed on a pay-as-you-go basis based on data writes, data reports, storage volume, and retention period.
For pricing details, see Container Monitoring Billing.
Default basic metrics
After Prometheus monitoring is enabled, the following metrics are collected automatically. For descriptions of each metric, see Metric descriptions.
-
Basic resource monitoring for containers (kubelet)
-
Application state monitoring for clusters (kube-state-metrics)
-
Basic resource monitoring for nodes (node-exporter)
-
GPU monitoring for nodes (ack-gpu-exporter)
-
Control plane component monitoring for managed clusters — covers API Server, etcd, kube-scheduler, kube-controller-manager, and cloud-controller-manager
-
Basic monitoring for CoreDNS
-
Basic monitoring for Ingress Controller
The following metrics are reported automatically when specific features are enabled:
-
Container Storage Monitoring Overview — reports metrics for the csi-plugin component
-
Cost Insight — reports metrics for the ack-cost-exporter component
-
Colocation of multi-types workloads monitoring and resource profile — reports metrics for the ack-koordinator component
FAQ
The Prometheus Monitoring page shows "No related monitoring dashboard found"
This typically means the Prometheus instance lost its connection to the cluster after installation. Reinstall the component first, then check the agent connection if the issue persists.
-
Reinstall the Prometheus monitoring component.
-
After confirming uninstallation is complete, click Install, then click OK in the dialog box.
-
After installation completes, return to the Prometheus Monitoring page to check whether the issue is resolved. If the issue persists, continue to the next step.
-
Check the Prometheus instance connection.
-
In the ARMS consoleARMS console left navigation pane, click Integration Management.
-
On the Integrated Environments tab, check the Container Service list for a container environment with the same name as your cluster.
-
No matching environment found — See Connect using the ARMS or Prometheus console.
-
Matching environment found — Click Configure Agent in the Actions column to open the Configure Agent page and verify that the installed agents are running as expected.
-
-
How do I adjust the metric storage duration?
How do I view the version of the ack-arms-prometheus component?
-
On the ClustersClustersClustersClustersClusters page, click the name of the target cluster. In the left navigation pane, click Add-ons.
-
On the Add-ons page, click the Logs and Monitoring tab and find the ack-arms-prometheus component. The current version is displayed below the component name. If a newer version is available, click Upgrade next to the version number.
The Upgrade option appears only if the installed version is not the latest.
Why can't I deploy GPU monitoring?
GPU monitoring may fail to deploy if a GPU node has taints. Run the following command to check for taints on the node.
kubectl describe node cn-beijing.47.100.***.***
If the node has custom taints, the output includes entries for them. For example, a taint with key test-key, value test-value, and effect NoSchedule appears as:
Taints: test-key=test-value:NoSchedule
To resolve the issue, use one of the following approaches:
-
Remove the taint from the GPU node:
kubectl taint node cn-beijing.47.100.***.*** test-key=test-value:NoSchedule- -
Add a toleration to the ack-prometheus-gpu-exporter DaemonSet so pods can be scheduled to the node:
# Edit the ack-prometheus-gpu-exporter DaemonSet kubectl edit daemonset -n arms-prom ack-prometheus-gpu-exporterAdd the following
tolerationsfield at the same level ascontainers:# Add above the containers field, at the same indentation level tolerations: - key: "test-key" operator: "Equal" value: "test-value" effect: "NoSchedule" containers: # Other fields omitted
How do I completely and manually delete ARMS-Prometheus?
Deleting only the arms-prom namespace leaves residual configurations that can cause reinstallation to fail. To fully remove all ARMS-Prometheus resources, run the following commands.
-
Delete the arms-prom namespace:
kubectl delete namespace arms-prom -
Delete ClusterRoles:
kubectl delete ClusterRole arms-kube-state-metrics kubectl delete ClusterRole arms-node-exporter kubectl delete ClusterRole arms-prom-ack-arms-prometheus-role kubectl delete ClusterRole arms-prometheus-oper3 kubectl delete ClusterRole arms-prometheus-ack-arms-prometheus-role kubectl delete ClusterRole arms-pilot-prom-k8s kubectl delete ClusterRole gpu-prometheus-exporter kubectl delete ClusterRole o11y:addon-controller:role kubectl delete ClusterRole arms-aliyunserviceroleforarms-clusterrole -
Delete ClusterRoleBindings:
kubectl delete ClusterRoleBinding arms-node-exporter kubectl delete ClusterRoleBinding arms-prom-ack-arms-prometheus-role-binding kubectl delete ClusterRoleBinding arms-prometheus-oper-bind2 kubectl delete ClusterRoleBinding arms-kube-state-metrics kubectl delete ClusterRoleBinding arms-pilot-prom-k8s kubectl delete ClusterRoleBinding arms-prometheus-ack-arms-prometheus-role-binding kubectl delete ClusterRoleBinding gpu-prometheus-exporter kubectl delete ClusterRoleBinding o11y:addon-controller:rolebinding kubectl delete ClusterRoleBinding arms-kube-state-metrics-agent kubectl delete ClusterRoleBinding arms-node-exporter-agent kubectl delete ClusterRoleBinding arms-aliyunserviceroleforarms-clusterrolebinding -
Delete Roles and RoleBindings:
kubectl delete Role arms-pilot-prom-spec-ns-k8s kubectl delete Role arms-pilot-prom-spec-ns-k8s -n kube-system kubectl delete RoleBinding arms-pilot-prom-spec-ns-k8s kubectl delete RoleBinding arms-pilot-prom-spec-ns-k8s -n kube-system
How do I uninstall Managed Service for Prometheus using Helm?
Use this method if you deployed the service manually with Helm, or if residual resources remain due to environment or Helm version issues.
An "xxx in use" error occurs when installing ack-arms-prometheus
-
On the ClustersClustersClustersClustersClusters page, click the name of the target cluster. In the left navigation pane, choose Applications > Helm.
-
On the Helm page, check whether ack-arms-prometheus exists.
-
Found — Delete ack-arms-prometheus from the Helm page, then reinstall it on the Add-ons page. For details, see Manage components.
-
Not found — Residual resources remain from a previous deletion of the ack-arms-prometheus Helm release. Manually delete all ARMS-Prometheus resources, then re-enable Prometheus monitoring.
-
Installation of ack-arms-prometheus fails after a "Component Not Installed" message
Check each of the following in order:
-
Verify whether ack-arms-prometheus is already installed.
-
On the ClustersClustersClustersClustersClusters page, click the name of the target cluster. In the left navigation pane, choose Applications > Helm.
-
On the Helm page, check whether ack-arms-prometheus exists.
-
Found — Delete ack-arms-prometheus from the Helm page, then reinstall it on the Add-ons page. For details, see Manage components.
-
Not found — Manually delete all ARMS-Prometheus resources, then re-enable Prometheus monitoring.
-
-
-
Check ack-arms-prometheus logs for errors.
-
In the cluster details left navigation pane, choose Workloads > Deployments.
-
At the top of the Deployments page, set Namespace to arms-prom and click arms-prometheus-ack-arms-prometheus.
-
Click the Logs tab and check for errors.
-
-
Check whether an error occurred during agent installation.
-
Log on to the ARMS consoleARMS console. In the left navigation pane, click Integration Management.
-
On the Integration Management tab, find the target container environment in the Container Service list. In the Actions column, click Configure Agent to open the Configure Agent page.
-
Grant monitoring permissions for an ACK dedicated cluster
ACK dedicated clusters require explicit monitoring policy authorization before enabling Prometheus monitoring. Follow these steps to grant the required permissions.
-
On the ClustersClustersClustersClustersClusters page, click the name of the target cluster. In the left navigation pane, click Cluster Information.
-
On the Basic Information tab, click the KubernetesWorkerRole-*** link next to Worker RAM Role. On the RAM role page, click the Permissions tab. In the Policy column, click k8sWorkerRole****.
-
On the access policy details page, click the Policy Document tab, then click Edit Policy Document.
-
In the JSON editor, add the following authorization rule to the
Statementfield and click OK.{ "Version": "1", "Statement": [ { "Action": [ "arms:Describe*", "arms:List*", "arms:Get*", "arms:Search*", "arms:Check*", "arms:Query*", "arms:ListEnvironments", "arms:DescribeAddonRelease", "arms:InstallAddon", "arms:DeleteAddonRelease", "arms:ListEnvironmentDashboards", "arms:ListAddonReleases", "arms:CreateEnvironment", "arms:UpdateEnvironment", "arms:InitEnvironment", "arms:DescribeEnvironment", "arms:InstallEnvironmentFeature", "arms:ListEnvironmentFeatures", "cms:CreateIntegrationPolicy", "cms:ListAddonReleases", "cms:UpdateAddonRelease", "cms:CreateAddonRelease", "cms:GetPrometheusInstance", "cms:ListIntegrationPolicyStorageRequirements" ], "Resource": "*", "Effect": "Allow" } ] }