Prometheus is an open source project that is used to monitor cloud-native applications. This topic describes how to deploy Prometheus in a Container Service for Kubernetes (ACK) cluster.

Prerequisites

Background information

Typically, a monitoring system monitors the following types of object:
  • Resource: resource utilization of nodes and applications. In a Kubernetes cluster, the monitoring system monitors the resource usage of nodes, pods, and the cluster.
  • Application: internal metrics of applications. For example, the monitoring system dynamically counts the number of online users who are using an application, collects monitoring metrics from application ports, and enables alerting based on the collected metrics.
In a Kubernetes cluster, the monitoring system monitors the following objects:
  • Cluster components: the components of the Kubernetes cluster, such as kube-apiserver, kube-controller-manager, and etcd.
  • Static resource entities, such as status of resources on nodes and kernel events.
  • Dynamic resource entities: entities of abstract workloads in Kubernetes, such as Deployments, DaemonSets, and pods.
  • Custom objects in applications: custom data and metrics that are used to monitor applications.

To monitor cluster components and static resource entities, specify the monitoring methods in configuration files.

To monitor dynamic resource entities in a Kubernetes cluster, you can deploy Prometheus in the Kubernetes cluster.

Procedure

  1. Deploy Prometheus.
    1. Log on to the ACK console.
    2. In the left-side navigation pane of the ACK console, choose Marketplace > App Catalog.
    3. On the Marketplace page, click the App Catalog tab. Then, find and click ack-prometheus-operator.
    4. On the ack-prometheus-operator page, click Deploy.
    5. In the Deploy wizard, select a cluster and a namespace, and then click Next.
    6. On the Parameters wizard page, set the parameters and click Next.
      Check the deployment result.
      1. Run the following command to map Prometheus in the cluster to local port 9090:
        kubectl port-forward svc/ack-prometheus-operator-prometheus 9090:9090 -n monitoring
      2. Enter localhost:9090 in the address bar of a browser to visit the Prometheus page.
      3. In the top navigation bar, choose Status > Targets to view all data collection tasks. Data collection tasksTasks in the UP state are running as normal. Targets
  2. View the aggregated data.
    1. Run the following command to map Grafana in the cluster to local port 3000:
      kubectl -n monitoring port-forward svc/ack-prometheus-operator-grafana 3000:80
    2. To view the aggregated data, enter localhost:3000 in the address bar of a browser, and then select a dashboard.
      Dashboard
  3. View alert rules and set silent alerts.
    • View alert rules
      To view alert rules, enter localhost:9090 in the address bar of a browser, and then click Alerts in the top navigation bar.
      • Red: Alerts are being triggered based on alert rules in red.
      • Green: No alerts are being triggered based on alert rules in green.
      Alerts
    • Set silent alerts
      Run the following command. Enter localhost:9093 in the address bar of a browser and click Silence to set silent alerts.
      kubectl --namespace monitoring port-forward svc/alertmanager-operated 9093
      Silence

You can follow the preceding steps to deploy Prometheus in a cluster. The following examples describe how to configure Prometheus in different scenarios.

Alert configurations

To configure alert notification methods or notification templates, perform the following steps to configure the config field in the alertmanager section:

  • Configure alert notification methods
    You can set prometheus-operator to send alert notifications by using DingTalk messages or emails. You can perform the following steps to configure the alert notification method:
    • Configure DingTalk notifications

      On the ack-prometheus-operator page, click Deploy. On the Parameters wizard page, set enabled to true in the dingtalk section, set the webhook URL of your DingTalk chatbot to the token field, and set the receiver field of the config parameter in the alertmanager section to the alert name that is specified in the receivers field. The default value of the receivers field is webhook.

      If you have two DingTalk chatbots, perform the following steps:
      1. Replace the parameter values in the token field with the webhook URLs of your DingTalk chatbots.
        Copy the webhook URLs of your DingTalk chatbots and replace the parameter values of dingtalk1 and dingtalk2 in the token field with the copied URLs. In this example, https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxx is replaced by the webhook URLs.
        Token configuration
      2. Modify the value of the receiver parameter.

        In the alertmanager section, set the receiver fields in the config parameter to the alert names that are specified in the receivers field. In this example, webhook1 and webhook2 are used.

      3. Modify the url parameter.
        Replace the value of the url parameter with the names of your DingTalk chatbots. In this example, dingtalk1 and dingtalk2 are used.
        Webhook configuration
      Note To add more DingTalk chatbots, add more webhook URLs.
    • Configure email notifications
      On the ack-prometheus-operator page, click Deploy. On the Parameters wizard page, specify the details about your email address as shown in the red box of the following figure, and set the receiver field of the config parameter in the alertmanager section to the alert name that is specified in the receivers field. The default value of the receivers field is mail. Email notifications
  • Configure alert notification templates
    You can customize the alert notification template in the templateFiles field of the alertmanager section on the Parameters wizard page, as shown in the following figure. Template configuration

Storage configuration

Monitoring data that is generated by Prometheus can be stored in Time Series Database (TSDB) or on disks. You can perform the following steps to configure data storage:
  • Store data in TSDB
    On the ack-prometheus-operator page, click Deploy. On the Parameters wizard page, set enabled to true in the tsdb section, and set the url fields of the remoteRead and remoteWrite parameters. Store data in TSDB
  • Store data on disks
    By default, ack-prometheus-operator allows you to store data on Alibaba Cloud disks. On the ack-prometheus-operator page, click Deploy. On the Parameters wizard page, set the storage parameter in the alertmanager section or the storageSpec parameter in the prometheus section. You can specify the disk type in the storageClassName field, specify the access mode in the accessModes field, and specify the disk capacity in the storage field. Store data on disks
    Note For example, you want to store Prometheus data on an SSD. In the storageSpec parameter, set storageClassName to alicloud-disk-ssd, accessModes to ReadWriteOnce, and storage to 50Gi, as shown in the following figure. Disk configuration

    To check the configuration, go to the Elastic Compute Service (ECS) console and choose Storage & Snapshots > Disks. On the Disks page, you can view the SSD that is used.

    For information about how to reuse a disk, see Disk volume overview.

Use prometheus-adapter to enable auto scaling

prometheus-adapter allows you to specify custom metrics for pod auto scaling. To enable prometheus-adapter, set enabled to true in the prometheusAdapter section and specify custom metrics. This way, the cluster can automatically scale the number of pods based on the specified metrics, which improves resource utilization.

On the ack-prometheus-operator page, click Deploy. On the Parameters wizard page, find the prometheusAdapter section and set enabled to true. This way, prometheus-adapter is enabled. Enable prometheus-adapter
You can run the following command to verify the configuration. For information about how to specify custom metrics, see prometheus-adapter.
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1"

Mount a ConfigMap to Prometheus

This section describes how to mount a ConfigMap to the /etc/prometheus/configmaps/ path of a pod.

You must first deploy prometheus-operator in your cluster as described in 1. On the Parameters wizard page, set the configMaps field in the prometheus section to the name of the ConfigMap that you want to mount. Mount a ConfigMap

If prometheus-operator has been deployed in your cluster, perform the following steps:

  1. Log on to the ACK console.
  2. In the left-side navigation pane of the ACK console, click Clusters.
  3. On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.
  4. In the left-side navigation pane of the cluster details page, choose Applications > Helm.
  5. Find the ack-prometheus-operator release and click Update in the Actions column.
  6. In the Update Release panel, set the configMaps fields in the prometheus and alertmanager sections to the name of the ConfigMap that you want to mount. Then, click OK.
    Update configurations

    For example, you want to mount a ConfigMap named special-config, which contains the configuration of Prometheus. To configure the special-config ConfigMap as a configuration file of the Prometheus pod, add the following configuration to the configMaps field in the prometheus section to mount the ConfigMap to the /etc/prometheus/configmaps/ path.

    The following figure shows an example of the special-config ConfigMap.

    special-config
    The following figure shows how to set the configMaps field in the prometheus section. configmaps

Configure Grafana

  • Mount the dashboard configuration to Grafana
    You can perform the following steps to mount a ConfigMap that contains the dashboard configuration to the Grafana pod. On the ack-prometheus-operator page, click Deploy. On the Parameters wizard page, add the following configurations to the extraConfigmapMounts section, as shown in the following figure. Mount the dashboard configuration to Grafana
    Note
    • Make sure that you have a ConfigMap that contains the dashboard configuration in your cluster.

      This labels that are added to the ConfigMap must be the same as those added to other ConfigMaps.

    • In the extraConfigmapMounts section of the Grafana configuration, specify the name of the ConfigMap and how to mount the ConfigMap.
    • Set mountPath to /tmp/dashboards/.
    • Set configMap to the name of the ConfigMap.
    • Set name to the name of the JSON file that stores the dashboard configuration.
  • Enable data persistence for dashboards

    You can perform the following steps to enable data persistence for Grafana dashboards:

    1. Log on to the ACK console.
    2. In the left-side navigation pane of the ACK console, click Clusters.
    3. On the Clusters page, find the cluster that you want to manage, and click the name of the cluster or click Applications in the Actions column.
    4. In the left-side navigation pane, choose Applications > Helm.
    5. Find ack-prometheus-operator and click Update in the Actions column.
    6. In the Update Release panel, configure the persistence field in the grafana section as shown in the following figure.
    Enable data persistence for Grafana dashboards

    You can export data on Grafana dashboards in JSON format to your on-premises machine. For more information, see Export a Grafana dashboard.

FAQ

  • What do I do if I fail to receive DingTalk alert notifications?
    1. Obtain the webhook URL of your DingTalk chatbot. For more information, see Scenario 3: Use DingTalk to raise alerts upon Kubernetes events.
    2. On the Parameters wizard page, find the dingtalk section, set enabled to true, and then specify the webhook URL of your DingTalk chatbot in the token field. For more information, see Configure DingTalk alert notifications in Alert configurations.
  • What do I do if the following error message appears when I deploy prometheus-operator in a cluster?
    The error message is:
    Can't install release with errors: rpc error: code = Unknown desc = object is being deleted: customresourcedefinitions.apiextensions.k8s.io "xxxxxxxx.monitoring.coreos.com" already exists
    The error message indicates that the cluster fails to clear custom resource definition (CRD) objects of the previous deployment. Run the following commands to delete the CRD objects. Then, deploy prometheus-operator again:
    kubectl delete crd prometheuses.monitoring.coreos.com
    kubectl delete crd prometheusrules.monitoring.coreos.com
    kubectl delete crd servicemonitors.monitoring.coreos.com
    kubectl delete crd alertmanagers.monitoring.coreos.com
  • What do I do if I fail to receive email alert notifications?

    Make sure that the value of smtp_auth_password is the SMTP authorization code instead of the logon password of the email account. Make sure that the SMTP server endpoint includes a port number.

  • What do I do if the console prompts the following error message after I click Update to update YAML templates: The current cluster is temporarily unavailable. Try again later or submit a ticket?

    If the configuration file of Tiller is overlarge, the cluster cannot be accessed. To solve this issue, you can delete some annotations in the configuration file and mount the file to a pod as a ConfigMap. You can specify the name of the ConfigMap in the configMaps fields of the prometheus and alertmanager sections. For more information, see the second method in Mount a ConfigMap to Prometheus.

  • How do I enable the features of prometheus-operator after I deploy it in a cluster?

    After prometheus-operator is deployed, you can perform the following steps to enable the features of prometheus-operator. Go to the cluster details page and choose Applications > Helm in the left-side navigation pane. On the Helm page, find ack-prometheus-operator and click Update in the Actions column. In Update Release panel, configure the code block to enable the features. Then, click OK.

  • How do I select data storage: TSDB or disks?
    TSDB storage is available to limited regions. However, disk storage is supported in all regions. The following figure shows how to configure the data retention policy. Data retention policy
  • What do I do if a Grafana dashboard fails to display data properly?

    Go to the cluster details page and choose Applications > Helm in the left-side navigation pane. On the Helm page, find ack-prometheus-operator and click Update in the Actions column. In Update Release panel, check whether the value of the clusterVersion field is correct. If the Kubernetes version of your cluster is earlier than 1.16, set clusterVersion to 1.14.8-aliyun.1. If the Kubernetes version of your cluster is 1.16 or later, set clusterVersion to 1.16.6-aliyun.1.

  • What do I do if I fail to install ack-prometheus after I delete the ack-prometheus namespace?
    After you delete the ack-prometheus namespace, the related resource configurations may be retained. In this case, you may fail to install ack-prometheus again. You can perform the following operations to delete the related resource configurations:
    1. Delete role-based access control (RBAC)-related resource configurations.
      1. Run the following commands to delete the related ClusterRoles:
        kubectl delete ClusterRole ack-prometheus-operator-grafana-clusterrole
        kubectl delete ClusterRole ack-prometheus-operator-kube-state-metrics
        kubectl delete ClusterRole psp-ack-prometheus-operator-kube-state-metrics
        kubectl delete ClusterRole psp-ack-prometheus-operator-prometheus-node-exporter
        kubectl delete ClusterRole ack-prometheus-operator-operator
        kubectl delete ClusterRole ack-prometheus-operator-operator-psp
        kubectl delete ClusterRole ack-prometheus-operator-prometheus
        kubectl delete ClusterRole ack-prometheus-operator-prometheus-psp
      2. Run the following commands to delete the related ClusterRoleBindings:
        kubectl delete ClusterRoleBinding ack-prometheus-operator-grafana-clusterrolebinding
        kubectl delete ClusterRoleBinding ack-prometheus-operator-kube-state-metrics
        kubectl delete ClusterRoleBinding psp-ack-prometheus-operator-kube-state-metrics
        kubectl delete ClusterRoleBinding psp-ack-prometheus-operator-prometheus-node-exporter
        kubectl delete ClusterRoleBinding ack-prometheus-operator-operator
        kubectl delete ClusterRoleBinding ack-prometheus-operator-operator-psp
        kubectl delete ClusterRoleBinding ack-prometheus-operator-prometheus
        kubectl delete ClusterRoleBinding ack-prometheus-operator-prometheus-psp
    2. Run the following command to delete the related CRD objects:
      kubectl delete crd alertmanagerconfigs.monitoring.coreos.com
      kubectl delete crd alertmanagers.monitoring.coreos.com
      kubectl delete crd podmonitors.monitoring.coreos.com
      kubectl delete crd probes.monitoring.coreos.com
      kubectl delete crd prometheuses.monitoring.coreos.com
      kubectl delete crd prometheusrules.monitoring.coreos.com
      kubectl delete crd servicemonitors.monitoring.coreos.com
      kubectl delete crd thanosrulers.monitoring.coreos.com