After you enable Prometheus Service, you can view dashboards and performance metrics that are preset for Container Service for Kubernetes (ACK). This topic describes how to enable Prometheus Service in ACK, how to configure alert rules in Prometheus Service, and how to customize monitoring metrics and use Grafana to display monitoring metrics.Prometheus Service

Background information

Prometheus Service is a managed monitoring service that is provided by Alibaba Cloud. Prometheus Service is compatible with the open source Prometheus ecosystem and provides out-of-the-box dashboards for you to monitor a wide variety of components. Prometheus Service saves you the effort to manage underlying services, such as data storage, data presentation, and system maintenance.

For information about Prometheus Service, see What is Prometheus Service?.

Enable Prometheus Service

Method 1: Enable Prometheus Service when you create a cluster

On the Component Configurations wizard page, select Enable Prometheus Monitoring. For more information, see Create an ACK managed cluster. Enable Prometheus Service
Note
  • By default, Enable Prometheus Monitoring is selected when you create a cluster in the ACK console.
  • After the ACK cluster is created, the system automatically configures Prometheus Service.

Method 2: Enable Prometheus Service in an existing cluster

  1. Log on to the ACK console and click Clusters in the left-side navigation pane.
  2. On the Clusters page, click the name of the cluster that you want to manage. In the left-side navigation pane, choose Operations > Prometheus Monitoring.
  3. In the middle part of the Prometheus Monitoring page, click Install.
    The system automatically installs the component and checks the dashboards. After Prometheus Service is installed, you can click each tab to view monitoring metrics.

View the Grafana dashboards in Prometheus Service

On the Prometheus Monitoring page, click the name of a Grafana dashboard to view the monitoring data.

Configure alert rules in Prometheus Service

Prometheus Service allows you to create alert rules for monitoring jobs. When alert rules are met, you can receive alerts through emails, Short Message Service (SMS) messages, and DingTalk notifications in real time. This helps you detect errors in a proactive manner. When an alert rule is met, notifications are sent to the contact group that you specified. Before you can create a contact group, you must create a contact. When you create a contact, you can specify the mobile phone number and email address of the contact to receive notifications. You can also provide a DingTalk chatbot webhook URL that is used to automatically send alert notifications.
Note To add a DingTalk chatbot as a contact, you must first obtain the webhook URL of the chatbot. Fore more information, see Configure a DingTalk chatbot to send alert notifications.
  1. Log on to the ARMS console.
  2. In the left-side navigation pane, choose Alert Management > Contacts.
  3. On the Contacts tab, click Create Contact in the upper-right corner. Configure the contact and click OK.
  4. Configure an alert rule.
    1. Log on to the ARMS console.
    2. In the left-side navigation pane, choose Prometheus Service > Prometheus Instances.
    3. In the upper-left corner of the Prometheus Service page, select the region where your ACK cluster is deployed and click the Prometheus instance that you want to manage. Then, you are redirected to the instance details page.
    4. In the left-side navigation pane, click Alarm Configuration.
    5. Select the alert rule that you want to manage and click Edit in the Actions column. Modify the PromQL statement and click OK.
      For more information about how to configure PromQL statements, see Create ARMS alerts.
    Note You can also choose Alarms > Alarm Policies in the ARMS console to manage alert rules.

    Verify the result

    Perform a manual test to trigger a DingTalk alert notification. The following figure shows a sample alert notification.Monitoring and alerting

Customize monitoring metrics and use Grafana to display monitoring metrics

Method 1: Use annotations to customize monitoring metrics

You can add annotations to pod configuration templates to define custom monitoring metrics. The application monitoring component of ARMS uses Prometheus Service to automatically obtain these custom monitoring metrics.

  1. Log on to the ACK console and click Clusters in the left-side navigation pane.
  2. On the Clusters page, click the name of a cluster and choose Workloads > Deployments in the left-side navigation pane.
  3. On the Deployments page, create an application.
    1. Click Create from Image.
    2. On the Basic Information wizard page, set basic parameters and click Next.
    3. Create a web application and open port 5000 for the application.
      In this example, the yejianhonghong/pindex image is used. Container configurations
    4. Click Next.
    5. Add pod annotations.
      The prometheus.io/port annotation is used to specify the endpoint port that Prometheus Service scrapes. The prometheus.io/path annotation is used to specify the endpoint path that Prometheus Service scrapes. Labels and annotations
    6. Click Create to create the application.
  4. On the Services page, create a Service.
    1. In the left-side navigation pane of the details page, choose Network > Services
    2. In the upper-right corner of the Services page, click Create.
    3. Select Server Load Balancer and Public Access for the Type parameter.
    4. Select the application that you created in Step 4 for the Backend parameter.
    5. Click Create to create the Service.
    For more information, see Create Services.
  5. Configure custom monitoring metrics.
    1. Log on to the ARMS console.
    2. In the left-side navigation pane, choose Prometheus Service > Prometheus Instances.
    3. In the upper-left corner of the Prometheus Service page, select the region where your ACK cluster is deployed and click the Prometheus instance that you want to manage. Then, you are redirected to the instance details page.
    4. In the left-side navigation pane, click Service Discovery. Click the Targets tab. You can verify that the custom metrics are configured.
      Custom metrics
  6. Access the public IP address of the Service that you created in Step 5. This increases the value of a custom metric.
    For more information about how to configure metrics, see Data model. Increase the value of a custom metric
  7. Go to the Dashboards page in the ARMS console and click a dashboard to go to the Grafana page. Click Add panel in the upper-right corner, select the Graph type, and then enter current_person_counts in the Metrics field.
  8. Save the settings to view the Grafana chart of the custom metric.
    Grafana

Method 2: Use ServiceMonitors to customize monitoring metrics

To use ServiceMonitors to customize monitoring metrics, you must add labels to Services. You do not need to add annotations.

  1. Log on to the ACK console and click Clusters in the left-side navigation pane.
  2. On the Clusters page, click the name of a cluster and choose Workloads > Deployments in the left-side navigation pane.
  3. On the Deployments page, create an application.
    1. Click Create from Image.
    2. On the Basic Information wizard page, set basic parameters and click Next.
    3. Create a web application and open port 5000 for the application.
      In this example, the yejianhonghong/pindex image is used. Container configurations
    4. Click Create to create the application.
  4. On the Services page, create a Service.
    1. In the left-side navigation pane of the details page, choose Network > Services
    2. In the upper-right corner of the Services page, click Create.
    3. Select Server Load Balancer and Public Access for the Type parameter.
    4. Select the application that you created in Step 4 for the Backend parameter.
    5. Add labels.
      This label is used by ServiceMonitors as a selector. Create a Service
    6. Click Create to create the Service.
    For more information, see Create Services.
  5. Specify the endpoint that Prometheus Service scrapes.
    1. Log on to the ARMS console.
    2. In the left-side navigation pane, choose Prometheus Service > Prometheus Instances.
    3. In the upper-left corner of the Prometheus Service page, select the region where your ACK cluster is deployed and click the Prometheus instance that you want to manage. Then, you are redirected to the instance details page.
    4. In the left-side navigation pane, click Service Discovery. Then, click the Configure tab.
    5. On the Configure tab, click ServiceMonitor.
    6. On the ServiceMonitor tab, click Add ServiceMonitor.
      In this example, the following template is used to create a ServiceMonitor.
      apiVersion: monitoring.coreos.com/v1
      kind: ServiceMonitor
      metadata:
        # Enter a unique name. 
        name: custom-metrics-pindex
        # Specify a namespace. 
        namespace: default
      spec:
        endpoints:
        - interval: 30s
          # Enter the name of the port specified in the Port Mapping section when you created the Service, as shown in the preceding figure. 
          port: web
          # Enter the path of the Service. 
          path: /access
        namespaceSelector:
          any: true
          # The namespace of the NGINX demo application. 
        selector:
          matchLabels:
            # Enter the label that you added to the Service. 
            app: custom-metrics-pindex

      Click OK to create the ServiceMonitor.

    7. On the Targets tab, verify that the endpoints that Prometheus Service scrapes are displayed.
      Scape Endpioint
      Note The definition of a ServiceMonitor provides more information than an annotation, and includes the namespace and name of the Service.
  6. Connect to the public IP address of the Service that you created in Step 5. This increases the value of a custom metric.
    For more information about how to configure metrics, see Data model. Increase the value of a custom metric
  7. Go to the Dashboards page of the ARMS console and click a dashboard to go to the Grafana page. Click Add panel in the upper-right corner, select the Graph type, and then enter current_person_counts in the Metrics field.
  8. Save the settings to view the Grafana chart of the custom metric.
    Grafana

FAQ

How do I check the version of the ack-arms-prometheus component?

  1. Log on to the ACK console and click Clusters in the left-side navigation pane.
  2. On the Clusters page, click the name of a cluster and choose Operations > Add-ons in the left-side navigation pane.
  3. On the Add-ons page, click the Logs and Monitoring tab and find ack-arms-prometheus.
    The version number is displayed in the lower part of the component. If a new version is available, click Upgrade on the right side to update the component.
    Note The Upgrade button is displayed only if the component is not updated to the latest version.

Why cannot ARMS Prometheus monitor GPU-accelerated nodes?

ARMS Prometheus may not be able to monitor GPU-accelerated nodes that are configured with taints. You can perform the following steps to view the taints of a GPU-accelerated node.

  1. Run the following command to view the taints of a GPU-accelerated node:
    If you added custom taints to the GPU-accelerated node, you can view the information about the custom taints. In this example, a taint whose key is set to test-key, value is set to test-value, and effect is set to NoSchedule is added to the node.
    kubectl describe node cn-beijing.47.100.***.***

    Expected output:

    Taints:test-key=test-value:NoSchedule
  2. Use one of the following methods to handle the taint:
    • Run the following command to delete the taint from the GPU-accelerated node:
      kubectl taint node cn-beijing.47.100.***.*** test-key=test-value:NoSchedule-
    • Add a toleration rule that allows pods to be scheduled to the CPU-accelerated node with the taint.
      # 1. Run the following command to modify ack-prometheus-gpu-exporter: 
      kubectl edit daemonset -n arms-prom ack-prometheus-gpu-exporter
      
      # 2. Add the following fields to the YAML file to tolerate the taint: 
      # Irrelevant fields are not shown. 
      #The tolerations field must be added above the containers field and both fields must be of the same level. 
      tolerations:
      - key: "test-key"
        operator: "Equal"
        value: "test-value"
        effect: "NoSchedule"
      containers:
       # Irrelevant fields are not shown. 

What do I do if I fail to reinstall ARMS Prometheus after I delete the arms-prom namespace?

If you delete only the arms-prom namespace, resource configurations may be retained. In this case, you may fail to reinstall ARMS Prometheus. You can perform the following operations to delete the residual resource configurations:

  • Run the following commands to delete the related ClusterRoles:
    kubectl delete ClusterRole arms-kube-state-metrics
    kubectl delete ClusterRole arms-node-exporter
    kubectl delete ClusterRole arms-prom-ack-arms-prometheus-role
    kubectl delete ClusterRole arms-prometheus-oper3
    kubectl delete ClusterRole arms-prometheus-ack-arms-prometheus-role
    kubectl delete ClusterRole arms-pilot-prom-k8s
  • Run the following commands to delete the related ClusterRoleBindings:
    kubectl delete ClusterRoleBinding arms-node-exporter
    kubectl delete ClusterRoleBinding arms-prom-ack-arms-prometheus-role-binding
    kubectl delete ClusterRoleBinding arms-prometheus-oper-bind2
    kubectl delete ClusterRoleBinding kube-state-metrics
    kubectl delete ClusterRoleBinding arms-pilot-prom-k8s
    kubectl delete ClusterRoleBinding arms-prometheus-ack-arms-prometheus-role-binding
  • Run the following commands to delete the related Roles and RoleBindings:
    kubectl delete Role arms-pilot-prom-spec-ns-k8s
    kubectl delete Role arms-pilot-prom-spec-ns-k8s -n kube-system
    kubectl delete RoleBinding arms-pilot-prom-spec-ns-k8s
    kubectl delete RoleBinding arms-pilot-prom-spec-ns-k8s -n kube-system