Configure Prometheus alerts - Cloud Monitor - Alibaba Cloud Documentation Center

Managed Service for Prometheus is integrated with Cloud Monitor, where you can create Prometheus alert rules. Alert events from these rules are sent to ARMS alert management for routing to channels such as DingTalk and SMS.

Prerequisites

Your cloud service is connected to Prometheus.
You have activated ARMS.

Create an alert rule

Log on to the Cloud Monitor console.
In the left-side navigation pane, choose Prometheus Monitoring > Alert Rules.
On the Alert Rule page, select a region, and then click Create Alert Rule.

In the Create Alert Rule panel, configure the following parameters.

Parameter	Description	Example
Rule name	A unique name for the alert rule.	Production Cluster - Container CPU Utilization Alert
Data source	Select the Prometheus instance for the alert rule.	Production Cluster
Detection condition definition method	Configure based on a preset metric: Prometheus provides preset alert metrics for quickly creating alert rules. Custom PromQL: Enter a PromQL expression directly.	Configure based on a preset metric
Based on predefined metrics - Metric group	Select a metric group.	Kubernetes Workload
Based on predefined metrics - Metric	Select the metric for the alert rule.	Container CPU Utilization
Based on predefined metrics - Filter condition	Defines which resources the alert rule applies to. An alert triggers for resources matching both the filter and detection conditions. Available filter conditions: Traverse: Applies to all resources in the Prometheus instance. Default. Equal To: Applies only to the resource with the specified name. Single value only. Not Equal To: Excludes the resource with the specified name. Single value only. Match Regular Expression: Applies to resources whose names match the specified regular expression. Do Not Match Regular Expression: Excludes resources whose names match the specified regular expression.	Traverse
Based on predefined metrics - Detection condition	The condition that triggers the alert, based on the selected metric.	The alert is triggered when container CPU utilization is `greater than` `80`%.
Based on predefined metrics - PromQL preview	The generated PromQL expression. Reuse it to create a custom PromQL rule or build a dashboard panel.	-
Custom PromQL - Custom PromQL	The PromQL expression for the alert rule.	100 - avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100 > 80
Query result preview	A time-series chart previewing metric values for the current configuration.	None
Severity	Severity level, from lowest to highest: P4, P3, P2, P1. Default: P4.	P4
Duration	The duration, in minutes, a condition must persist before an alert is triggered.	1
Alert check cycle	The interval, in minutes, between alert rule evaluations.	1
Detect after data completion	Whether to wait for the complete dataset before evaluating the alert condition.	Yes
Alert content	The alert notification content. Supports Go template syntax. For rules using Configure based on a preset metric, a default template is provided per metric. Modify it as needed. Example: `Node {{ $labels.instance }} CPU utilization{{$labels.metrics_params_opt}} {{$labels.metrics_params_value}}%, current CPU utilization {{ printf "%.2f" $value }}%` In Tags, click + Add Tag to add a custom key-value pair (for example, key `custom_label_key`, value `custom_label_value`). In Content, reference it with `Custom label: {{$labels.custom_label_key}}`. Go templates are populated from PromQL query result labels. For example, `{{$labels.namespace}}` and `{{$labels.pod_name}}` come from: `(sum(container_memory_working_set_bytes{id!="/"}) BY (instance, name,container, pod_name , namespace) / sum(container_spec_memory_limit_bytes{id!="/"} > 0) BY (instance, name, container, pod_name , namespace) * 100) > 1` Reference custom tags and annotations with variables such as `{{$labels.custom_label_key}}`.	`Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} CPU utilization{{$labels.metrics_params_opt_label_value}} {{$labels.metrics_params_value}}%, current value{{ printf "%.2f" $value }}%`
Tags	Custom tags for the alert, usable in notification policy match rules.	None
Annotations	Additional metadata for the alert.	None
Alert notification	Alert events are sent to ARMS alert management by default. Configure a notification policy in ARMS to route notifications.	Yes

Click OK. The new alert rule appears on the Alert Rule page.

Configure alert notifications

Log on to the ARMS console. In the left-side navigation pane, choose Alert Management > Notification Policies.
On the Notification Policies page, click Create Notification Policy. On the Create Notification Policy page, enter a name for the policy.
In the Matching Rule section, configure rules to match alert events.
1. Select ARMS-Prometheus Monitoring as the data source.
  
  This adds "_source_ equals ARMS-Prometheus Monitoring" to the Condition List. Click + Add Condition to add more conditions. Multiple conditions use logical AND.
2. Set the match rule expression. Select an existing or custom tag, then specify a value. Enter values manually if not listed in the dropdown.
  Existing tags include system tags and labels from the alert rule metric.
  Expand to view configuration examples
  
  Prometheus instance
  
  To target a specific Prometheus instance, set the clustername tag to match the instance name.
  
  Metric labels
  
  ARMS routes alerts based on event tags. By default, labels from the by clause of a PromQL expression become tags on alert events.
  
  For example, if the alert expression is:
  AliyunEcs_CPUUtilization{}>90
  
  All labels of the AliyunEcs_CPUUtilization metric become alert event tags.
  
  If the alert expression is:
  avg(AliyunEcs_CPUUtilization) by (instanceId,instanceName) > 90
  
  Only the instanceId and instanceName labels become alert event tags.
  
  You can route notifications based on these tags. For example, to send alerts with instanceId = abcdexxxxxx to a specific DingTalk group, create a rule where instanceId equals abcdexxxxxx.
3. Click Next.
In the Event Group section, configure how alert events are grouped, and then click Next.
- Do not group: Each alert event is sent as a separate notification.
- Set grouping fields: Groups alert events that have identical values for the specified fields into a single notification.
In the Notification Objects section, configure the following parameters.
1. Click +Add Notification Recipient to select a notification recipient.
  Notification recipient types:
  - Contact: You must also select a notification method (phone call, text message, or email) for the selected contact.
  - Contact Group: You must also select a notification method (phone call, text message, or email) for the selected contact group.
  - On-call Schedule: You must also select a notification method (phone call, text message, or email) for the selected on-call schedule.
  - DingTalk/Lark/WeCom: Sends notifications to a DingTalk, Lark, or WeCom channel.
  - General Webhook: Sends notifications to a specified webhook URL.
2. Select whether to send a recovery notification after an alert is resolved.
  
  Send recovery notification: When enabled, a recovery notification is sent after all events in an alert are resolved, and the alert is then automatically marked as Resolved.
3. Set a notification template. For more information, see Configure notification and webhook templates.
4. Define a notification period to send alerts only within the specified time window.
5. Optional: Select a ticket system to push alerts to. For details about how to integrate a ticket system, see Integrations.
6. Click Next.
In the Repeat/Escalate/Recover Policy section, configure whether to repeat notifications or use an escalation policy for an alert, and then click Next.
- Repeat notifications for an alert: When enabled, notifications for unresolved alerts are resent at the specified frequency until the alert is resolved.
- Escalation policy:
  
  If you select No escalation policy, the notification is sent only once for an unresolved alert.
  
  If you select Use escalation policy, the notification is sent to other recipients according to the escalation policy.
- Manual recovery: When enabled, alerts will not be automatically resolved, even if no new events are triggered during the integration's automatic recovery period. You must resolve these alerts manually.
In the Action Integration section, configure automated actions that run when an alert is triggered or resolved. For more information, see Execute an alert plan by using an ARMS action integration.
Click Save to create the policy.

Prerequisites

Create an alert rule

Configure alert notifications

Prometheus instance

Metric labels