Integrate with Managed Service for Prometheus to comprehensively collect control plane, node, and application metrics from your ACK cluster, and improve cluster performance management efficiency through visual dashboards and real-time alerts.
Choose an edition
Managed Service for Prometheus fully integrates with the open-source Prometheus ecosystem and provides a fully managed monitoring service. You do not need to manage the underlying data storage, data visualization, or O&M.
-
Pro Edition (Recommended): Provides a 90-day metric retention period, fully managed collectors, and a production-grade SLA of 99.95%. It offers customizable Grafana dashboards and pre-configured alert rules for Container Service for Kubernetes (ACK) components. For more information, see Use Container Monitoring Pro Edition.
-
Basic Edition: Provides a 7-day metric retention period. You must maintain the collectors, and only basic dashboards are provided.
Enable Prometheus monitoring
Existing cluster
-
(Optional) For an ACK dedicated cluster, first grant monitoring permissions to the cluster.
-
On the Clusters page, click the name of the target cluster. In the left-side navigation pane of the cluster details page, choose .
-
On the Prometheus Monitoring page, select a container monitoring edition and click Install.
After you enable monitoring, default basic metrics are automatically collected. To collect custom metrics, see Collect custom metrics. You can view multiple pre-configured dashboards on the current page, such as Cluster Overview, Node Monitoring, Application Monitoring, Network Monitoring, and Storage Monitoring.
New cluster
-
ACK managed cluster (Pro Edition):
On the Component Configurations step of the cluster creation wizard, in the Container Monitoring section, select Container Cluster Monitoring Pro Edition or Container Cluster Monitoring Basic Edition. For more information, see Create an ACK managed cluster.
Auto Mode defaults to Container Cluster Monitoring Basic Edition.
-
ACK managed cluster (Basic Edition), ACS clusters, and ACK Serverless clusters:
On the Component Configurations step of the cluster creation wizard, in the Container Monitoring section, select Enable Managed Service for Prometheus. This installs the Container Cluster Monitoring Basic Edition.
After you enable monitoring, default basic metrics are automatically collected. To collect custom metrics, see Collect custom metrics. You can view multiple pre-configured dashboards, such as Cluster Overview, Node Monitoring, Application Monitoring, Network Monitoring, and Storage Monitoring, by choosing in the left-side navigation pane of the cluster details page.
Configure alert notifications
Configure alert rules for key metrics to automatically send notifications through channels like email, SMS, or DingTalk when anomalies occur.
-
Log on to the ARMS console. In the left-side navigation pane, choose .
-
On the Notification Objects page, select a notification method and create a notification policy.
-
In the left-side navigation pane of the ARMS console, choose .
-
On the Prometheus alert rules page, click Create Prometheus Alert Rule.
For detailed configuration instructions, see Create a Prometheus alert rule.
Collect custom metrics
Managed Service for Prometheus supports multiple methods to collect custom metrics, such as queries per second (QPS) and processing latency. For more information, see Manage custom collection rules for container environments.
Disable Prometheus monitoring
-
In the left-side navigation pane of the cluster details page, click Add-ons.
-
On the Add-ons page, click the Logs and Monitoring tab, find the ack-arms-prometheus component, and then click Uninstall. In the confirmation dialog box, click OK.
Billing
-
Cluster monitoring fees: The Basic Edition is free. The Pro Edition is billed on a pay-as-you-go basis based on the number of nodes in the cluster.
-
Prometheus instance fees: Collecting basic metrics is free by default. Collecting custom metrics is billed on a pay-as-you-go basis based on factors such as write volume, report volume, storage volume, and retention period.
For detailed billing rules and pricing, see Billing of Container Monitoring.
Default basic metrics
After you enable Prometheus monitoring, basic metrics for container monitoring are automatically collected. For detailed descriptions of these metrics, see Metrics.
-
Basic resource metrics for containers from kubelet.
-
Application state metrics for clusters from kube-state-metrics.
-
Basic resource metrics for cluster nodes from node-exporter.
-
GPU metrics for cluster nodes from ack-gpu-exporter.
-
Metrics for control plane components of managed clusters, including API server, etcd, kube-scheduler, kube-controller-manager, and cloud-controller-manager.
-
Basic monitoring metrics for CoreDNS in the cluster.
-
Basic monitoring metrics for Ingress controllers in the cluster.
-
The following basic metrics are reported automatically after you enable the corresponding features:
-
After you enable Container Storage Monitoring, the csi-plugin component starts reporting metrics.
-
After you enable Cost Insight, the ack-cost-exporter component starts reporting metrics.
-
After you enable workload co-location monitoring and Resource Profiling, the ack-koordinator component starts reporting metrics.
-
FAQ
Prometheus Monitoring page shows no dashboard
If you see the No related monitoring dashboard found prompt on the page after you enable Prometheus monitoring, follow these steps to resolve the issue.
-
Reinstall the Prometheus monitoring component.
-
Reinstall the component:
-
After you confirm that the component is uninstalled, click Install. In the confirmation dialog box, click OK.
-
After the installation is complete, return to the Prometheus Monitoring page to check whether the issue is resolved.
If the issue persists, proceed with the following steps.
-
-
Check the Prometheus instance integration.
-
In the left-side navigation pane of the ARMS console, click Integration Management.
-
On the Integrations tab, check the Container Service list for a container environment with the same name as your cluster.
-
If no such container environment exists, see Integrate a service in the ARMS or Prometheus console.
-
If such a container environment exists, click Settings in the Actions column of the target container environment to go to the Settings page.
Check whether the installed agent is running as expected.
-
-
How to adjust the metric retention period
For more information, see Adjust the retention period of metrics.
How to view the ack-arms-prometheus component version
-
On the Clusters page, click the name of the target cluster. In the left-side navigation pane, click Add-ons.
-
On the Add-ons page, click the Logs and Monitoring tab and find the ack-arms-prometheus component.
The current version is displayed below the component name. If a later version is available and you want to upgrade the component, click Upgrade next to the version number.
NoteThe Upgrade option is displayed only if the installed version is not the latest version.
Why does GPU monitoring fail to deploy?
GPU monitoring deployment can fail if the GPU node has taints. To resolve this issue, first check the node's taints.
Run the following command to check the taints of the target GPU node.
If the GPU node has custom taints, you can find the related entries. This example uses a taint with a
keyoftest-key, avalueoftest-value, and aneffectofNoSchedule:kubectl describe node cn-beijing.47.100.***.***Expected output:
Taints:test-key=test-value:NoScheduleHandle the GPU node taints in one of the following two ways:
Run the following command to remove the taint from the GPU node.
kubectl taint node cn-beijing.47.100.***.*** test-key=test-value:NoSchedule-Declare a toleration for the taint to allow pods to be scheduled to the node.
# 1. Run the following command to edit the ack-prometheus-gpu-exporter DaemonSet. kubectl edit daemonset -n arms-prom ack-prometheus-gpu-exporter # 2. Add the following fields to the YAML file to declare the toleration for the taint. # Other fields are omitted. # The `tolerations` field is added above the `containers` field and at the same level. tolerations: - key: "test-key" operator: "Equal" value: "test-value" effect: "NoSchedule" containers: # Other fields are omitted.
How to completely uninstall ARMS Prometheus
Deleting only the namespace of Managed Service for Prometheus leaves residual configurations after the resources are deleted. This affects reinstallation. You can perform the following operations to completely and manually delete the residual ARMS-Prometheus configurations.
Delete the arms-prom namespace.
kubectl delete namespace arms-promDelete the ClusterRoles.
kubectl delete ClusterRole arms-kube-state-metrics kubectl delete ClusterRole arms-node-exporter kubectl delete ClusterRole arms-prom-ack-arms-prometheus-role kubectl delete ClusterRole arms-prometheus-oper3 kubectl delete ClusterRole arms-prometheus-ack-arms-prometheus-role kubectl delete ClusterRole arms-pilot-prom-k8s kubectl delete ClusterRole gpu-prometheus-exporter kubectl delete ClusterRole o11y:addon-controller:role kubectl delete ClusterRole arms-aliyunserviceroleforarms-clusterroleDelete the ClusterRoleBindings.
kubectl delete ClusterRoleBinding arms-node-exporter kubectl delete ClusterRoleBinding arms-prom-ack-arms-prometheus-role-binding kubectl delete ClusterRoleBinding arms-prometheus-oper-bind2 kubectl delete ClusterRoleBinding arms-kube-state-metrics kubectl delete ClusterRoleBinding arms-pilot-prom-k8s kubectl delete ClusterRoleBinding arms-prometheus-ack-arms-prometheus-role-binding kubectl delete ClusterRoleBinding gpu-prometheus-exporter kubectl delete ClusterRoleBinding o11y:addon-controller:rolebinding kubectl delete ClusterRoleBinding arms-kube-state-metrics-agent kubectl delete ClusterRoleBinding arms-node-exporter-agent kubectl delete ClusterRoleBinding arms-aliyunserviceroleforarms-clusterrolebindingDelete the Roles and RoleBindings.
kubectl delete Role arms-pilot-prom-spec-ns-k8s kubectl delete Role arms-pilot-prom-spec-ns-k8s -n kube-system kubectl delete RoleBinding arms-pilot-prom-spec-ns-k8s kubectl delete RoleBinding arms-pilot-prom-spec-ns-k8s -n kube-system
How to uninstall by using Helm
Use this method to uninstall the service if you manually deployed it by using Helm, or if residual resources remain due to environment or Helm version issues.
-
On the Clusters page, click the name of the target cluster. In the left-side navigation pane, choose .
-
On the Helm page, find the ack-arms-prometheus release, click Delete in the Actions column, select Clear Release Records, and then delete the application as prompted.
"xxx in use" error during installation
-
On the Clusters page, click the name of the target cluster. In the left-side navigation pane, choose .
-
On the Helm page, check whether ack-arms-prometheus exists.
-
If it exists, delete ack-arms-prometheus from the Helm page and reinstall it on the Add-ons page. For more information about how to install ack-arms-prometheus, see Manage add-ons.
-
If it does not exist:
-
The ack-arms-prometheus Helm release may have residual resources. Manually uninstall ARMS Prometheus completely.
-
-
"Component Not Installed" error during installation
-
Check whether the ack-arms-prometheus component is installed.
-
On the Clusters page, click the name of the target cluster. In the left-side navigation pane, choose .
-
On the Helm page, check whether ack-arms-prometheus exists.
-
If it exists, delete ack-arms-prometheus from the Helm page and reinstall it on the Add-ons page. For more information about how to install ack-arms-prometheus, see Manage add-ons.
-
If it does not exist:
-
The ack-arms-prometheus Helm release may have residual resources. Manually uninstall ARMS Prometheus completely.
-
-
-
-
Check the logs of ack-arms-prometheus for errors.
-
In the left-side navigation pane of the cluster details page, choose .
-
At the top of the Deployments page, set Namespace to arms-prom and then click arms-prometheus-ack-arms-prometheus.
-
Click the Logs tab to check for errors in the logs.
-
-
Check whether an error occurred during agent installation.
-
Log on to the ARMS console. In the left-side navigation pane, click Integration Management.
-
On the Integrations tab, check the Container Service list. In the Actions column of the target container environment, click Settings to go to the Settings page.
-
How to grant monitoring permissions for ACK dedicated clusters
-
On the Clusters page, click the name of the target cluster. In the left-side navigation pane, click Cluster Information.
-
On the Basic Information tab, click the KubernetesWorkerRole-*** link next to Worker RAM Role. On the RAM role page, on the Permission Management tab, click the k8sWorkerRole**** link in the Policies column.
-
On the policy details page, click the Policy Content tab and then click Edit Policy Document.
-
In the script editor, add the following authorization rule to the Statement field and click OK.
{ "Version": "1", "Statement": [ { "Action": [ "arms:Describe*", "arms:List*", "arms:Get*", "arms:Search*", "arms:Check*", "arms:Query*", "arms:ListEnvironments", "arms:DescribeAddonRelease", "arms:InstallAddon", "arms:DeleteAddonRelease", "arms:ListEnvironmentDashboards", "arms:ListAddonReleases", "arms:CreateEnvironment", "arms:UpdateEnvironment", "arms:InitEnvironment", "arms:DescribeEnvironment", "arms:InstallEnvironmentFeature", "arms:ListEnvironmentFeatures", "cms:CreateIntegrationPolicy", "cms:ListAddonReleases", "cms:UpdateAddonRelease", "cms:CreateAddonRelease", "cms:GetPrometheusInstance", "cms:ListIntegrationPolicyStorageRequirements" ], "Resource": "*", "Effect": "Allow" } ] }
Related topics
Upgrade Managed Service for Prometheus from Basic Edition to Pro Edition