Prometheus is an project that is used to monitor cloud-native applications. This topic describes how to deploy Prometheus in a Container Service for Kubernetes (ACK) cluster.
Background information
This topic describes how to efficiently monitoring system components and resource entities in a Kubernetes cluster. A monitoring system monitors the following types of object:
Resource: resource utilization of nodes and applications. In a Kubernetes cluster, the monitoring system monitors the resource usage of nodes, pods, and the cluster.
Application: internal metrics of applications. For example, the monitoring system dynamically counts the number of online users who are using an application, collects monitoring metrics from application ports, and enables alerting based on the collected metrics.
In a Kubernetes cluster, the monitoring system monitors the following objects:
Cluster components: The components of the Kubernetes cluster, such as API server, cloud-controller-manager, and etcd. To monitor cluster components, specify the monitoring methods in configuration files.
Static resource entities: The status of resources on nodes and kernel events. To monitor static resource entities, specify the monitoring methods in configuration files.
Dynamic resource entities: Entities of abstract workloads in Kubernetes, such as Deployments, DaemonSets, and pods. To monitor dynamic resource entities in a Kubernetes cluster, you can deploy Prometheus in the Kubernetes cluster.
Custom objects in applications: For applications that require customized monitoring of data and metrics, specific configurations need to be set to meet unique monitoring requirements. This can be achieved by combining port exposure with the Prometheus monitoring solution.
Step 1: Deploy open source Prometheus
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose .
On the Helm page, click Deploy. In the Chart section of the Deploy panel, find and select ack-prometheus-operator, use the default values for other parameters, and then click Next.
In the Confirm message, the displayed information indicates that the component will be installed in the monitoring namespace by default and the default application name will be used as the component name. Click Yes.
If you want to use a custom application and a custom namespace, configure the Application Name and Namespace parameters in the Basic Information step.
On the Parameters wizard page, select 12.0.0 for the chart version, configure the parameters, and then click OK. Chart 12.0.0 supports alarm configuration. You can set monitoring and alarm conditions by using the built-in function.
You can customize the following optional parameters based on your business requirements:
Alert configuration: Alert notifications can be send by using DingTalk messages or emails.
Mount a custom ConfigMap to Prometheus: You can configure a custom ConfigMap based on your business requirements.
Mount the dashboard configuration to Grafana: You can use custom dashboards to enhance data visualization.
Step 2: View Prometheus collection tasks
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose .
On the Services page, select the namespace in which the ack-prometheus-operator is deployed (monitoring by default). Find ack-prometheus-operator-prometheus and click Update in the Actions column.
In the Update Service dialog box, set Service Type to SLB. Select Create Resource and set Access Method to Public Access. Select Pay-as-you-go (Pay-by-CU) for the Billing Method parameter and click OK. For information about the billing details, see CLB billing.
After the update is complete, copy its external IP address, and then access Prometheus by entering the
IP address: port number
in the address bar of a browser. Example:47.XX.XX.12:9090
.In the top navigation bar of the Prometheus page, choose
to view all data collection tasks. Tasks in the UP state are running properly.To view alert rules, click Alerts in the top navigation bar.
Step 3: View Grafana aggregated data
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose .
On the Services page, select the namespace in which the ack-prometheus-operator deployment is deployed (monitoring by default). Find ack-prometheus-operator-prometheus and click Update in the Actions column.
In the Update Service dialog box, set Service Type to SLB. Select Create Resource and set Access Method to Public Access. Select Pay-as-you-go (Pay-by-CU) for the Billing Method parameter and click OK. For information about the billing details, see CLB billing.
After the update is complete, copy its external IP address, and then access the aggregated data by entering the
IP address: port number
in the address bar of a browser. Example:47.XX.XX.12:80
. By default, the port number is 80.
(Optional) Step 4: Set silent alerts
Ignore alerts under specific conditions. All alerts that meet the matching conditions are ignored, and no notifications are sent or activated until the silent period ends or the silent rule is manually deleted.
Run the following command to associate an EIP with an ECS instance, enter localhost:9093
in the address bar of a browser and click Silenced to set silent alerts.
kubectl --address 0.0.0.0 port-forward svc/alertmanager-operated 9093 -n monitoring
Alert configurations
You can set prometheus-operator to sent alert notifications by using DingTalk messages or emails. The following section describes the detailed configuration steps.
On the Parameters wizard page, configure the alert parameters based on the following content:
Configure DingTalk notifications.
Find the
dingtalk
field in the configuration file and set theenabled
totrue
.Enter the webhook URL of your DingTalk chatbot in the
token
field. For more information about how to obtain the webhook URL, see Scenario 3: Implement Kubernetes monitoring and alerting with DingTalk chatbot.In the
alertmanager
section, find thereceiver
parameter in theconfig
field, and enter the DingTalk chatbot name that you specified for thereceivers
field. By default,webhook
is used.
Configure email notifications.
Enter the details about your email address in the red box of the following figure.
Find the
config
field in thealertmanager
section of the configuration file, findreceiver
and enter the email you defined in thereceivers
field. By default,mail
is used.
Set alert notification templates.
You can customize the alert notification template in the templateFiles field of the alertmanager section on the Parameters wizard page, as shown in the following figure.
Mount a custom ConfigMap to Prometheus
This section describes how to mount a ConfigMap named special-config
to Prometheus. This ConfigMap contains the configuration of Prometheus. This configuration file is passed as a --config.file
parameter when the Prometheus pod is started.
Create a ConfigMap.
Mount a ConfigMap.
On the Parameters page, add the following contents to the
ConfigMaps
field to mount the ConfigMap to the pod. The path is/etc/prometheus/configmaps/
.The following figure shows how to set the ConfigMaps field in the prometheus section.
## ConfigMaps is a list of ConfigMaps in the same namespace as the Prometheus. ## The ConfigMaps are mounted into /etc/prometheus/configmaps/. ## configMaps: - "special-config" - "detail-config"
Configure Grafana
Mount the dashboard configuration to Grafana
You can perform the following steps to mount a ConfigMap that contains the dashboard configuration to the Grafana pod. On the Parameters wizard page, add the following configurations to the
extraConfigmapMounts
section, as shown in the following figure.NoteMake sure that the dashboard exists in the cluster as a ConfigMap. Then, the labels of the ConfigMap must be in the same format as those of other ConfigMaps.
In the
extraConfigmapMounts
section of the Grafana configuration, specify the name of the ConfigMap and how to mount the ConfigMap.Set mountPath to /tmp/dashboards/.
Set configMap to the name of the ConfigMap.
Set name to the name of the JSON file that stores the dashboard configuration.
Enable data persistence for dashboards
You can perform the following steps to enable data persistence for Grafana dashboards:
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose
.Find ack-prometheus-operator and click Update in the Actions column.
In the Update Release panel, configure the persistence field in the grafana section as shown in the following figure.
You can export data on Grafana dashboards in JSON format to your on-premises machine. For more information, see Export a Grafana dashboard.
Uninstall the open source Prometheus component
Check the Helm chart version of the open source Prometheus component and perform the following steps to uninstall the open source Prometheus component. This helps you prevent residual resources and unexpected exceptions. You need to manually delete the related Helm release, namespace, CustomResourceDefinitions (CRDs), and kubelet Service.
If the kubelet Service cannot be automatically deleted when you uninstall the ack-prometheus-operator component, refer to the following sections. For more information about this issue, see #1523.
Chart v12.0.0
Use the ACK console
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane of the cluster details page, perform the following operations:
Uninstall the Helm release: Choose . On the Helm page, select the ack-prometheus-operator release and click Delete in the Actions column. In the Delete dialog box, select Clear Release Records and click OK.
Delete the related namespace: Click Namespaces and Quotas. On the Namespace page, select the monitoring namespace and click Delete in the lower part of the page. In the Confirm message, confirm the information and click Confirm Deletion.
Delete the related CRDs: Choose
. On the Custom Resources page, click the CRD tab. Select all CRDs that belong to themonitoring.coreos.com
API group and click Delete in the lower part of the page. In the Confirm message, confirm the information and click Confirm Deletion. The following list describes the CRDs in the API group:AlertmanagerConfig
Alertmanager
PodMonitor
Probe
Prometheus
PrometheusRule
ServiceMonitor
ThanosRuler
Delete the kubelet Service: Choose Network > Services. On the Services page, select the ack-prometheus-operator-kubelet Service in the kube-system namespace and then click Delete in the lower part of the page. In the Confirm message, confirm the information and click Confirm Deletion.
Use kubectl
Uninstall the related Helm release.
helm uninstall ack-prometheus-operator -n monitoring
Delete the related namespace.
kubectl delete namespace monitoring
Delete the related CRDs.
kubectl delete crd alertmanagerconfigs.monitoring.coreos.com kubectl delete crd alertmanagers.monitoring.coreos.com kubectl delete crd podmonitors.monitoring.coreos.com kubectl delete crd probes.monitoring.coreos.com kubectl delete crd prometheuses.monitoring.coreos.com kubectl delete crd prometheusrules.monitoring.coreos.com kubectl delete crd servicemonitors.monitoring.coreos.com kubectl delete crd thanosrulers.monitoring.coreos.com
Delete the kubelet Service.
kubectl delete service ack-prometheus-operator-kubelet -n kube-system
Chart v65.1.1
Use the ACK console
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane of the cluster details page, perform the following operations:
Uninstall the Helm release: Choose . On the Helm page, select the ack-prometheus-operator release and click Delete in the Actions column. In the Delete dialog box, select Clear Release Records and click OK.
Delete the related namespace: Click Namespaces and Quotas. On the Namespace page, select the monitoring namespace and click Delete in the lower part of the page. In the Confirm message, confirm the information and click Confirm Deletion.
Delete the related CRDs: Choose
. On the Custom Resources page, click the CRD tab. Select all CRDs that belong to themonitoring.coreos.com
API group and click Delete in the lower part of the page. In the Confirm message, confirm the information and click Confirm Deletion. The following list describes the CRDs in the API group:AlertmanagerConfig
Alertmanager
PodMonitor
Probe
PrometheusAgent
Prometheus
PrometheusRule
ScrapeConfig
ServiceMonitor
ThanosRuler
Delete the kubelet Service: Choose Network > Services. On the Services page, select the ack-prometheus-operator-kubelet Service in the kube-system namespace and then click Delete in the lower part of the page. In the Confirm message, confirm the information and click Confirm Deletion.
kubectl
Uninstall the related Helm release.
helm uninstall ack-prometheus-operator -n monitoring
Delete the related namespace.
kubectl delete namespace monitoring
Delete the related CRDs.
kubectl delete crd alertmanagerconfigs.monitoring.coreos.com kubectl delete crd alertmanagers.monitoring.coreos.com kubectl delete crd podmonitors.monitoring.coreos.com kubectl delete crd probes.monitoring.coreos.com kubectl delete crd prometheusagents.monitoring.coreos.com kubectl delete crd prometheuses.monitoring.coreos.com kubectl delete crd prometheusrules.monitoring.coreos.com kubectl delete crd scrapeconfigs.monitoring.coreos.com kubectl delete crd servicemonitors.monitoring.coreos.com kubectl delete crd thanosrulers.monitoring.coreos.com
Delete the kubelet Service.
kubectl delete service ack-prometheus-operator-kubelet -n kube-system
FAQ
What do I do if I fail to receive DingTalk alert notifications?
Obtain the webhook URL of your DingTalk chatbot. For more information, see Event monitoring.
On the Parameters wizard page, find the dingtalk section, set enabled to true, and then specify the webhook URL of your DingTalk chatbot in the token field. For more information, see Configure DingTalk alert notifications in Alert configurations.
What do I do if an error message appears when I deploy prometheus-operator in a cluster?
The following error message appears:
Can't install release with errors: rpc error: code = Unknown desc = object is being deleted: customresourcedefinitions.apiextensions.k8s.io "xxxxxxxx.monitoring.coreos.com" already exists
The error message indicates that the cluster fails to clear custom resource definition (CRD) objects of the previous deployment. Run the following commands to delete the CRD objects. Then, deploy prometheus-operator again:
kubectl delete crd prometheuses.monitoring.coreos.com kubectl delete crd prometheusrules.monitoring.coreos.com kubectl delete crd servicemonitors.monitoring.coreos.com kubectl delete crd alertmanagers.monitoring.coreos.com
What do I do if I fail to receive email alert notifications?
Make sure that the value of
smtp_auth_password
is the SMTP authorization code instead of the logon password of the email account. Make sure that the SMTP server endpoint includes a port number.What do I do if the console prompts the following error message after I click Update to update YAML templates: The current cluster is temporarily unavailable. Try again later or submit a ticket?
If the configuration file of Tiller is overlarge, the cluster cannot be accessed. To solve this issue, you can delete some annotations in the configuration file and mount the file to a pod as a ConfigMap. You can specify the name of the ConfigMap in the configMaps fields of the prometheus and alertmanager sections. For more information, see the second method in Mount a ConfigMap to Prometheus.
How do I enable the features of prometheus-operator after I deploy it in a cluster?
After prometheus-operator is deployed, you can perform the following steps to enable the features of prometheus-operator. Go to the cluster details page and choose
in the left-side navigation pane. On the Helm page, find ack-prometheus-operator and click Update in the Actions column. In Update Release panel, configure the code block to enable the features. Then, click OK.How do I select data storage: TSDB or disks?
TSDB storage is available to limited regions. However, disk storage is supported in all regions. The following figure shows how to configure the data retention policy.
What do I do if a Grafana dashboard fails to display data properly?
Go to the cluster details page and choose
in the left-side navigation pane. On the Helm page, find ack-prometheus-operator and click Update in the Actions column. In Update Release panel, check whether the value of the clusterVersion field is correct. If the Kubernetes version of your cluster is earlier than 1.16, set clusterVersion to 1.14.8-aliyun.1. If the Kubernetes version of your cluster is 1.16 or later, set clusterVersion to 1.16.6-aliyun.1.What do I do if I fail to install ack-prometheus after I delete the ack-prometheus namespace?
After you delete the ack-prometheus namespace, the related resource configurations may be retained. In this case, you may fail to install ack-prometheus again. You can perform the following operations to delete the related resource configurations:
Delete role-based access control (RBAC)-related resource configurations.
Run the following commands to delete the related ClusterRoles:
kubectl delete ClusterRole ack-prometheus-operator-grafana-clusterrole kubectl delete ClusterRole ack-prometheus-operator-kube-state-metrics kubectl delete ClusterRole psp-ack-prometheus-operator-kube-state-metrics kubectl delete ClusterRole psp-ack-prometheus-operator-prometheus-node-exporter kubectl delete ClusterRole ack-prometheus-operator-operator kubectl delete ClusterRole ack-prometheus-operator-operator-psp kubectl delete ClusterRole ack-prometheus-operator-prometheus kubectl delete ClusterRole ack-prometheus-operator-prometheus-psp
Run the following commands to delete the related ClusterRoleBindings:
kubectl delete ClusterRoleBinding ack-prometheus-operator-grafana-clusterrolebinding kubectl delete ClusterRoleBinding ack-prometheus-operator-kube-state-metrics kubectl delete ClusterRoleBinding psp-ack-prometheus-operator-kube-state-metrics kubectl delete ClusterRoleBinding psp-ack-prometheus-operator-prometheus-node-exporter kubectl delete ClusterRoleBinding ack-prometheus-operator-operator kubectl delete ClusterRoleBinding ack-prometheus-operator-operator-psp kubectl delete ClusterRoleBinding ack-prometheus-operator-prometheus kubectl delete ClusterRoleBinding ack-prometheus-operator-prometheus-psp
Run the following command to delete the related CRD objects:
kubectl delete crd alertmanagerconfigs.monitoring.coreos.com kubectl delete crd alertmanagers.monitoring.coreos.com kubectl delete crd podmonitors.monitoring.coreos.com kubectl delete crd probes.monitoring.coreos.com kubectl delete crd prometheuses.monitoring.coreos.com kubectl delete crd prometheusrules.monitoring.coreos.com kubectl delete crd servicemonitors.monitoring.coreos.com kubectl delete crd thanosrulers.monitoring.coreos.com