All Products
Search
Document Center

Container Service for Kubernetes:Use open source Prometheus to monitor an ACK cluster

Last Updated:Apr 02, 2025

Prometheus is an project that is used to monitor cloud-native applications. This topic describes how to deploy Prometheus in a Container Service for Kubernetes (ACK) cluster.

Background information

This topic describes how to efficiently monitoring system components and resource entities in a Kubernetes cluster. A monitoring system monitors the following types of object:

  • Resource: resource utilization of nodes and applications. In a Kubernetes cluster, the monitoring system monitors the resource usage of nodes, pods, and the cluster.

  • Application: internal metrics of applications. For example, the monitoring system dynamically counts the number of online users who are using an application, collects monitoring metrics from application ports, and enables alerting based on the collected metrics.

In a Kubernetes cluster, the monitoring system monitors the following objects:

  • Cluster components: The components of the Kubernetes cluster, such as API server, cloud-controller-manager, and etcd. To monitor cluster components, specify the monitoring methods in configuration files.

  • Static resource entities: The status of resources on nodes and kernel events. To monitor static resource entities, specify the monitoring methods in configuration files.

  • Dynamic resource entities: Entities of abstract workloads in Kubernetes, such as Deployments, DaemonSets, and pods. To monitor dynamic resource entities in a Kubernetes cluster, you can deploy Prometheus in the Kubernetes cluster.

  • Custom objects in applications: For applications that require customized monitoring of data and metrics, specific configurations need to be set to meet unique monitoring requirements. This can be achieved by combining port exposure with the Prometheus monitoring solution.

Step 1: Deploy open source Prometheus

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose Applications > Helm.

  3. On the Helm page, click Deploy. In the Chart section of the Deploy panel, find and select ack-prometheus-operator, use the default values for other parameters, and then click Next.

    • In the Confirm message, the displayed information indicates that the component will be installed in the monitoring namespace by default and the default application name will be used as the component name. Click Yes.

    • If you want to use a custom application and a custom namespace, configure the Application Name and Namespace parameters in the Basic Information step.

  4. On the Parameters wizard page, select 12.0.0 for the chart version, configure the parameters, and then click OK. Chart 12.0.0 supports alarm configuration. You can set monitoring and alarm conditions by using the built-in function.

    You can customize the following optional parameters based on your business requirements:

Step 2: View Prometheus collection tasks

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Network > Services.

  3. On the Services page, select the namespace in which the ack-prometheus-operator is deployed (monitoring by default). Find ack-prometheus-operator-prometheus and click Update in the Actions column.

  4. In the Update Service dialog box, set Service Type to SLB. Select Create Resource and set Access Method to Public Access. Select Pay-as-you-go (Pay-by-CU) for the Billing Method parameter and click OK. For information about the billing details, see CLB billing.

  5. After the update is complete, copy its external IP address, and then access Prometheus by entering the IP address: port number in the address bar of a browser. Example: 47.XX.XX.12:9090.

  6. In the top navigation bar of the Prometheus page, choose Status > Targets to view all data collection tasks. Tasks in the UP state are running properly.

    image

  7. To view alert rules, click Alerts in the top navigation bar.

    Alerts

Step 3: View Grafana aggregated data

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Network > Services.

  3. On the Services page, select the namespace in which the ack-prometheus-operator deployment is deployed (monitoring by default). Find ack-prometheus-operator-prometheus and click Update in the Actions column.

  4. In the Update Service dialog box, set Service Type to SLB. Select Create Resource and set Access Method to Public Access. Select Pay-as-you-go (Pay-by-CU) for the Billing Method parameter and click OK. For information about the billing details, see CLB billing.

  5. After the update is complete, copy its external IP address, and then access the aggregated data by entering the IP address: port number in the address bar of a browser. Example: 47.XX.XX.12:80. By default, the port number is 80.

    Dashboard

(Optional) Step 4: Set silent alerts

Ignore alerts under specific conditions. All alerts that meet the matching conditions are ignored, and no notifications are sent or activated until the silent period ends or the silent rule is manually deleted.

Run the following command to associate an EIP with an ECS instance, enter localhost:9093 in the address bar of a browser and click Silenced to set silent alerts.

kubectl --address 0.0.0.0 port-forward svc/alertmanager-operated 9093 -n monitoring

image

Alert configurations

You can set prometheus-operator to sent alert notifications by using DingTalk messages or emails. The following section describes the detailed configuration steps.

  1. On the Parameters wizard page, configure the alert parameters based on the following content:

    • Configure DingTalk notifications.

      • Find the dingtalk field in the configuration file and set the enabled to true.

      • Enter the webhook URL of your DingTalk chatbot in the token field. For more information about how to obtain the webhook URL, see Scenario 3: Implement Kubernetes monitoring and alerting with DingTalk chatbot.

      • In the alertmanager section, find the receiver parameter in the config field, and enter the DingTalk chatbot name that you specified for the receivers field. By default, webhook is used.

        Configuration example of multiple DingTalk chatbots

        If you have two DingTalk chatbots, perform the following steps:

        1. Replace the parameter values in the token field.

          Copy the webhook URLs of your DingTalk chatbots and replace the parameter values of dingtalk1 and dingtalk2 in the token field with the copied URLs. In this example, https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxx is replaced by the webhook URLs.

          token配置

        2. Modify receivers.

          In the alertmanager section, find the receiver parameter in the config field, and enter the DingTalk chatbot name that you specified for the receivers field. In this example, webhook1 and webhook2 are used.

        3. Modify the value of the url parameter.

          Replace the value of the url parameter with the names of your DingTalk chatbots. In this example, dingtalk1 and dingtalk2 are used.

          webhook配置

        Note

        To add more DingTalk chatbots, add more webhook URLs.

    • Configure email notifications.

      • Enter the details about your email address in the red box of the following figure.

      • Find the config field in the alertmanager section of the configuration file, find receiver and enter the email you defined in the receivers field. By default, mail is used.邮件告警

  • Set alert notification templates.

    You can customize the alert notification template in the templateFiles field of the alertmanager section on the Parameters wizard page, as shown in the following figure.模板设置

Mount a custom ConfigMap to Prometheus

This section describes how to mount a ConfigMap named special-config to Prometheus. This ConfigMap contains the configuration of Prometheus. This configuration file is passed as a --config.file parameter when the Prometheus pod is started.

  1. Create a ConfigMap.

    View the content of the ConfigMap

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: special-config
    data:
      config.yaml: |
        global:
          # The scrape interval. Default value: 1 minute.
          [ scrape_interval: <duration> | default = 1m ]
          
          # The scrape timeout period. Default value: 10 seconds.
          [ scrape_timeout: <duration> | default = 10s ]
          
          # The rule evaluation interval. Default value: 1 minute.
          [ evaluation_interval: <duration> | default = 1m ]
    
        # Scrape configurations
        scrape_configs:
        [ - <scrape_config> ... ]
    
        # Rule configurations
        rule_files:
        [ - <filepath_glob> ... ]
    
        # Alert configurations
        alerting:
          alert_relabel_configs:
          [ - <relabel_config> ... ]
          alertmanagers:
          [ - <alertmanager_config> ... ]
  2. Mount a ConfigMap.

    On the Parameters page, add the following contents to the ConfigMaps field to mount the ConfigMap to the pod. The path is /etc/prometheus/configmaps/.

    更新配置

    The following figure shows how to set the ConfigMaps field in the prometheus section.

    ## ConfigMaps is a list of ConfigMaps in the same namespace as the Prometheus.
    ## The ConfigMaps are mounted into /etc/prometheus/configmaps/.
    ##
    configMaps:
      - "special-config"
      - "detail-config"

Configure Grafana

  • Mount the dashboard configuration to Grafana

    You can perform the following steps to mount a ConfigMap that contains the dashboard configuration to the Grafana pod. On the Parameters wizard page, add the following configurations to the extraConfigmapMounts section, as shown in the following figure.dashboard外挂配置

    Note
    • Make sure that the dashboard exists in the cluster as a ConfigMap. Then, the labels of the ConfigMap must be in the same format as those of other ConfigMaps.

    • In the extraConfigmapMounts section of the Grafana configuration, specify the name of the ConfigMap and how to mount the ConfigMap.

      • Set mountPath to /tmp/dashboards/.

      • Set configMap to the name of the ConfigMap.

      • Set name to the name of the JSON file that stores the dashboard configuration.

  • Enable data persistence for dashboards

    You can perform the following steps to enable data persistence for Grafana dashboards:

    1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

    2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Applications > Helm.

    3. Find ack-prometheus-operator and click Update in the Actions column.

    4. In the Update Release panel, configure the persistence field in the grafana section as shown in the following figure.

    grafana的持久化操作

    You can export data on Grafana dashboards in JSON format to your on-premises machine. For more information, see Export a Grafana dashboard.

Uninstall the open source Prometheus component

Check the Helm chart version of the open source Prometheus component and perform the following steps to uninstall the open source Prometheus component. This helps you prevent residual resources and unexpected exceptions. You need to manually delete the related Helm release, namespace, CustomResourceDefinitions (CRDs), and kubelet Service.

If the kubelet Service cannot be automatically deleted when you uninstall the ack-prometheus-operator component, refer to the following sections. For more information about this issue, see #1523.

Chart v12.0.0

Use the ACK console

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane of the cluster details page, perform the following operations:

    • Uninstall the Helm release: Choose Applications > Helm. On the Helm page, select the ack-prometheus-operator release and click Delete in the Actions column. In the Delete dialog box, select Clear Release Records and click OK.

    • Delete the related namespace: Click Namespaces and Quotas. On the Namespace page, select the monitoring namespace and click Delete in the lower part of the page. In the Confirm message, confirm the information and click Confirm Deletion.

    • Delete the related CRDs: Choose Workloads > Custom Resources. On the Custom Resources page, click the CRD tab. Select all CRDs that belong to the monitoring.coreos.com API group and click Delete in the lower part of the page. In the Confirm message, confirm the information and click Confirm Deletion. The following list describes the CRDs in the API group:

      • AlertmanagerConfig

      • Alertmanager

      • PodMonitor

      • Probe

      • Prometheus

      • PrometheusRule

      • ServiceMonitor

      • ThanosRuler

    • Delete the kubelet Service: Choose Network > Services. On the Services page, select the ack-prometheus-operator-kubelet Service in the kube-system namespace and then click Delete in the lower part of the page. In the Confirm message, confirm the information and click Confirm Deletion.

Use kubectl

  • Uninstall the related Helm release.

    helm uninstall ack-prometheus-operator -n monitoring
  • Delete the related namespace.

    kubectl delete namespace monitoring
  • Delete the related CRDs.

    kubectl delete crd alertmanagerconfigs.monitoring.coreos.com
    kubectl delete crd alertmanagers.monitoring.coreos.com
    kubectl delete crd podmonitors.monitoring.coreos.com
    kubectl delete crd probes.monitoring.coreos.com
    kubectl delete crd prometheuses.monitoring.coreos.com
    kubectl delete crd prometheusrules.monitoring.coreos.com
    kubectl delete crd servicemonitors.monitoring.coreos.com
    kubectl delete crd thanosrulers.monitoring.coreos.com
  • Delete the kubelet Service.

    kubectl delete service ack-prometheus-operator-kubelet -n kube-system

Chart v65.1.1

Use the ACK console

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane of the cluster details page, perform the following operations:

    • Uninstall the Helm release: Choose Applications > Helm. On the Helm page, select the ack-prometheus-operator release and click Delete in the Actions column. In the Delete dialog box, select Clear Release Records and click OK.

    • Delete the related namespace: Click Namespaces and Quotas. On the Namespace page, select the monitoring namespace and click Delete in the lower part of the page. In the Confirm message, confirm the information and click Confirm Deletion.

    • Delete the related CRDs: Choose Workloads > Custom Resources. On the Custom Resources page, click the CRD tab. Select all CRDs that belong to the monitoring.coreos.com API group and click Delete in the lower part of the page. In the Confirm message, confirm the information and click Confirm Deletion. The following list describes the CRDs in the API group:

      • AlertmanagerConfig

      • Alertmanager

      • PodMonitor

      • Probe

      • PrometheusAgent

      • Prometheus

      • PrometheusRule

      • ScrapeConfig

      • ServiceMonitor

      • ThanosRuler

    • Delete the kubelet Service: Choose Network > Services. On the Services page, select the ack-prometheus-operator-kubelet Service in the kube-system namespace and then click Delete in the lower part of the page. In the Confirm message, confirm the information and click Confirm Deletion.

kubectl

  • Uninstall the related Helm release.

    helm uninstall ack-prometheus-operator -n monitoring
  • Delete the related namespace.

    kubectl delete namespace monitoring
  • Delete the related CRDs.

    kubectl delete crd alertmanagerconfigs.monitoring.coreos.com
    kubectl delete crd alertmanagers.monitoring.coreos.com
    kubectl delete crd podmonitors.monitoring.coreos.com
    kubectl delete crd probes.monitoring.coreos.com
    kubectl delete crd prometheusagents.monitoring.coreos.com
    kubectl delete crd prometheuses.monitoring.coreos.com
    kubectl delete crd prometheusrules.monitoring.coreos.com
    kubectl delete crd scrapeconfigs.monitoring.coreos.com
    kubectl delete crd servicemonitors.monitoring.coreos.com
    kubectl delete crd thanosrulers.monitoring.coreos.com
  • Delete the kubelet Service.

    kubectl delete service ack-prometheus-operator-kubelet -n kube-system

FAQ

  • What do I do if I fail to receive DingTalk alert notifications?

    1. Obtain the webhook URL of your DingTalk chatbot. For more information, see Event monitoring.

    2. On the Parameters wizard page, find the dingtalk section, set enabled to true, and then specify the webhook URL of your DingTalk chatbot in the token field. For more information, see Configure DingTalk alert notifications in Alert configurations.

  • What do I do if an error message appears when I deploy prometheus-operator in a cluster?

    The following error message appears:

    Can't install release with errors: rpc error: code = Unknown desc = object is being deleted: customresourcedefinitions.apiextensions.k8s.io "xxxxxxxx.monitoring.coreos.com" already exists

    The error message indicates that the cluster fails to clear custom resource definition (CRD) objects of the previous deployment. Run the following commands to delete the CRD objects. Then, deploy prometheus-operator again:

    kubectl delete crd prometheuses.monitoring.coreos.com
    kubectl delete crd prometheusrules.monitoring.coreos.com
    kubectl delete crd servicemonitors.monitoring.coreos.com
    kubectl delete crd alertmanagers.monitoring.coreos.com
  • What do I do if I fail to receive email alert notifications?

    Make sure that the value of smtp_auth_password is the SMTP authorization code instead of the logon password of the email account. Make sure that the SMTP server endpoint includes a port number.

  • What do I do if the console prompts the following error message after I click Update to update YAML templates: The current cluster is temporarily unavailable. Try again later or submit a ticket?

    If the configuration file of Tiller is overlarge, the cluster cannot be accessed. To solve this issue, you can delete some annotations in the configuration file and mount the file to a pod as a ConfigMap. You can specify the name of the ConfigMap in the configMaps fields of the prometheus and alertmanager sections. For more information, see the second method in Mount a ConfigMap to Prometheus.

  • How do I enable the features of prometheus-operator after I deploy it in a cluster?

    After prometheus-operator is deployed, you can perform the following steps to enable the features of prometheus-operator. Go to the cluster details page and choose Applications > Helm in the left-side navigation pane. On the Helm page, find ack-prometheus-operator and click Update in the Actions column. In Update Release panel, configure the code block to enable the features. Then, click OK.

  • How do I select data storage: TSDB or disks?

    TSDB storage is available to limited regions. However, disk storage is supported in all regions. The following figure shows how to configure the data retention policy.数据回收策略

  • What do I do if a Grafana dashboard fails to display data properly?

    Go to the cluster details page and choose Applications > Helm in the left-side navigation pane. On the Helm page, find ack-prometheus-operator and click Update in the Actions column. In Update Release panel, check whether the value of the clusterVersion field is correct. If the Kubernetes version of your cluster is earlier than 1.16, set clusterVersion to 1.14.8-aliyun.1. If the Kubernetes version of your cluster is 1.16 or later, set clusterVersion to 1.16.6-aliyun.1.

  • What do I do if I fail to install ack-prometheus after I delete the ack-prometheus namespace?

    After you delete the ack-prometheus namespace, the related resource configurations may be retained. In this case, you may fail to install ack-prometheus again. You can perform the following operations to delete the related resource configurations:

    1. Delete role-based access control (RBAC)-related resource configurations.

      1. Run the following commands to delete the related ClusterRoles:

        kubectl delete ClusterRole ack-prometheus-operator-grafana-clusterrole
        kubectl delete ClusterRole ack-prometheus-operator-kube-state-metrics
        kubectl delete ClusterRole psp-ack-prometheus-operator-kube-state-metrics
        kubectl delete ClusterRole psp-ack-prometheus-operator-prometheus-node-exporter
        kubectl delete ClusterRole ack-prometheus-operator-operator
        kubectl delete ClusterRole ack-prometheus-operator-operator-psp
        kubectl delete ClusterRole ack-prometheus-operator-prometheus
        kubectl delete ClusterRole ack-prometheus-operator-prometheus-psp
      2. Run the following commands to delete the related ClusterRoleBindings:

        kubectl delete ClusterRoleBinding ack-prometheus-operator-grafana-clusterrolebinding
        kubectl delete ClusterRoleBinding ack-prometheus-operator-kube-state-metrics
        kubectl delete ClusterRoleBinding psp-ack-prometheus-operator-kube-state-metrics
        kubectl delete ClusterRoleBinding psp-ack-prometheus-operator-prometheus-node-exporter
        kubectl delete ClusterRoleBinding ack-prometheus-operator-operator
        kubectl delete ClusterRoleBinding ack-prometheus-operator-operator-psp
        kubectl delete ClusterRoleBinding ack-prometheus-operator-prometheus
        kubectl delete ClusterRoleBinding ack-prometheus-operator-prometheus-psp
    2. Run the following command to delete the related CRD objects:

      kubectl delete crd alertmanagerconfigs.monitoring.coreos.com
      kubectl delete crd alertmanagers.monitoring.coreos.com
      kubectl delete crd podmonitors.monitoring.coreos.com
      kubectl delete crd probes.monitoring.coreos.com
      kubectl delete crd prometheuses.monitoring.coreos.com
      kubectl delete crd prometheusrules.monitoring.coreos.com
      kubectl delete crd servicemonitors.monitoring.coreos.com
      kubectl delete crd thanosrulers.monitoring.coreos.com