Create and manage an alert rule template - Application Real-Time Monitoring Service

If you want to create the same alert rule for Prometheus instances in different regions, you can use the alert rule template feature of Managed Service for Prometheus. This topic describes how to create and manage alert rule templates.

Background information

If you want to create alert rules for Prometheus instances in different regions, you can create an alert rule for each Prometheus instance. This method results in a high workload and low management efficiency. To address this issue, Alibaba Cloud Managed Service for Prometheus provides the alert rule template feature. You can create an alert rule template and apply the template to multiple Prometheus instances. You can manage the alert rules of these Prometheus instances in a unified and cost-efficient manner.

Operations related to the alert rule template feature

Create an alert rule template
Modify an alert rule template
Apply an alert rule template
- Prometheus instance selection mode
- Tag controller mode
Delete an alert rule template
View the alert rules that are created from a template
- Enable multiple alert rules at a time
- Disable multiple alert rules at a time
- Delete multiple alert rules at a time
Apply multiple templates at a time
Delete multiple templates at a time

Create an alert rule template

Log on to the ARMS console.
In the left-side navigation pane, choose Prometheus Monitoring > Prometheus Alert Rule Template.
In the upper-right corner of the Prometheus Alert Rule Template page, click Create Prometheus Alert Rule Template.

On the Create Prometheus Alert Rule Template page, set the following parameters.

When you create an alert rule template, you can set Check Type to Static Threshold or Custom PromQL.

If you set Check Type to Static Threshold, you can select a preset metric and create an alert rule template by using the metric.
To monitor a metric other than the preset metrics, you can use a custom PromQL statement to create an alert rule template.

Table 1. Use a preset metric
Parameter	Description	Example
Template Name	Enter a name for the alert rule template.	Production cluster - container CPU usage alert
Template Description	Optional. The description can contain the overview, scenarios, and notes of the template.	No
Check Type	Select Static Threshold.	Static Threshold
Alert Contact Group	Select an alert group.	Kubernetes load
Alert Metric	Select the metric that you want to monitor. Different alert groups provide different metrics.	Container CPU usage
Alert Condition	Specify the condition based on which alert events are generated.	If the CPU usage of the container `is greater than` `80`%, an alert event is generated.
Filter Conditions	Specify an application scope for the alert rule template. If a resource meets both the filter condition and the alert condition, an alert event is generated. The following types of filter conditions are supported: Traverse: The alert rule that you create by using the template applies to all resources in the current Prometheus instance. By default, the filter condition is set to Traverse. Equal: If you select this filter condition, you must enter a resource name. The alert rule that you create by using the template applies only to the specified resource. You cannot specify multiple resources at the same time. Not equal: If you select this filter condition, you must enter a resource name. The alert rule that you create by using the template applies to resources other than the specified resource. You cannot specify multiple resources at the same time. Regex match: If you select this filter condition, you must enter a regular expression to match resource names. The alert rule that you create by using the template applies to all resources that match the regular expression. Regex not match: If you select this filter condition, you must enter a regular expression to match resource names. The alert rule that you create by using the template applies to all resources except the resources that match the regular expression.	Instance IP address: Traverse
Duration	If the alert condition is met, an alert event is generated: If a data point reaches the threshold, an alert event is generated. If the alert condition is continuously met for N minutes, an alert event is generated: An alert event is generated only if the duration for which the threshold is reached is greater than or equal to N minutes.	1
Alert Level	Specify a severity level for the alert rule template. Default value: Default. Valid values: Default, P4, P3, P2, and P1. The preceding values are listed in ascending order of severity.	Default
Alert Message	Specify the alert message that you want to send to the end users. You can specify custom variables in the alert message based on the Go template syntax.	`Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} CPU usage{{$labels.metrics_params_opt_label_value}} {{$labels.metrics_params_value}}%, Current value {{ printf "%.2f" $value }}%`
Advanced Settings
Tags	Specify tags for the alert rule template. The specified tags can be used to match notification policies.	No
Annotations	Specify annotations for the alert rule template.	No

Table 2. Use a custom PromQL statement
Parameter	Description	Example
Template Name	Enter a name for the alert rule template.	Pod CPU usage exceeds 8%
Template Description	Optional. The description can contain the overview, scenarios, and notes of the template.	No
Check Type	Select Custom PromQL.	Custom PromQL
Custom PromQL Statements	Specify the PromQL statement based on which alert events are generated.	`max(container_fs_usage_bytes{pod!="", namespace!="arms-prom",namespace!="monitoring"}) by (pod_name, namespace, device)/max(container_fs_limit_bytes{pod!=""}) by (pod_name,namespace, device) * 100 > 90`
Duration	If the alert condition is met, an alert event is generated: If a data point reaches the threshold, an alert event is generated. If the alert condition is continuously met for N minutes, an alert event is generated: An alert event is generated only if the duration for which the threshold is reached is greater than or equal to N minutes.	1
Alert Level	Specify a severity level for the alert rule template. Default value: Default. Valid values: Default, P4, P3, P2, and P1. The preceding values are listed in ascending order of severity.	Default
Alert Message	Specify the alert message that you want to send to the end users. You can specify custom variables in the alert message based on the Go template syntax.	`Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / The usage of the {{$labels.device}} disk exceeds 90%. Current value: {{ printf "%.2f" $value }}%`
Advanced Settings
Tags	Specify tags for the alert rule template. The specified tags can be used to match notification policies.	No
Annotations	Specify annotations for the alert rule template.	No

Apply an alert rule template

After you create an alert rule template, you can apply the template to specific Prometheus instances to create alert rules or update existing alert rules.

You can apply an alert template in one of the following modes:

Prometheus instance selection mode

On the Prometheus Alert Rule Template page, find the alert rule template that you want to apply and click Apply Template in the Actions column.
On the Prometheus Instance Selection Mode tab in the Apply Template dialog box, select one or more Prometheus instances, and then click OK.
Note You can filter Prometheus instances by name, region, or type.
In the dialog box that appears, specify whether to update existing alert rules of the selected Prometheus instances, and then click OK.
Application Real-Time Monitoring Service (ARMS) uses the current template to create an alert rule for the selected Prometheus instances.
- If you do not select Update Created Alert Rules, and alert rules that are created by using the current template exist on the Prometheus instances, the following message appears: The alert rules are not updated because they are created from this template. In this case, the existing alert rules are not updated.
- If you select Update Created Alert Rules, the existing alert rules that are created on the selected Prometheus instances by using the current template are updated.
  Important If an alert rule is modified and the mapping between the alert rule and the template is retained, the modifications are overwritten by the new template.

Tag controller mode

You can add tags to a cluster in the Container Service for Kubernetes (ACK) console. Then, you can go to the ARMS console and filter clusters by tag on the Tag Controller Mode tab. You can configure tags for each template. After the configuration is complete, alert rules are created for the Prometheus instances that monitor the clusters that are matched.

The system dynamically updates alert rules based on tags:

When a template is modified, the alert rules that are used to monitor the clusters that match the tags are updated based on the new template.
If you modify an alert rule that is created in tag controller mode and retain the mapping between the alert rule and the alert rule template, the modifications are overwritten by the new template.
If the tags of an ACK cluster are changed, the tags of the Prometheus instance that monitors the cluster are also changed. The system deletes or creates alert rules based on the new tags.

Log on to the ACK console and add tags to a cluster.
1. In the left-side navigation pane, click Clusters.
2. On the Clusters page, move the pointer over the icon next to the cluster that you want to manage and click Edit Label.
3. In the Edit Label dialog box, add one or more tags and then click OK.
On the Prometheus Alert Rule Template page of the ARMS console, find the alert rule template that you want to apply and click Apply Template in the Actions column.
In the Apply Template dialog box, click the Tag Controller Mode tab. Then, specify tags and expressions.

Delete a template

If you no longer need to use an alert rule template, you can delete the template. When you delete a template, you can specify whether to retain the alert rules that are created from the template.

On the Prometheus Alert Rule Template page of the ARMS console, find the alert rule template that you want to delete and click Delete in the Actions column.
In the dialog box that appears, specify whether to delete the alert rules that are created from the template, and then click OK.
- If you select Delete Alert Rules Created from Template, the alert rules that are created from the template are deleted. However, if you modify an alert rule that is created from the template and select Remove the mapping between this alert rule and the alert rule template when you saved the changes, the alert rule is not deleted.
- If you do not select Delete Alert Rules Created from Template, the alert rules that are created from the template are retained.

View the alert rules that are created from a template

You can view and manage the alert rules that are created from a template.

On the Prometheus Alert Rule Template page of the ARMS console, find the alert rule template that you want to manage and click View Alert Rules in the Actions column.
In the Alert Rules Created from Template dialog box, manage alert rules based on your needs.
- Enable multiple alert rules at a time: Select the alert rules that you want to enable and click Enable Alerts.
- Disable multiple alert rules at a time: Select the alert rules that you want to disable and click Disable Alerts.
- Delete multiple alert rules at a time: Select the alert rules that you want to delete and click Delete Alerts.

Precautions

After you modify an alert rule that is created from a template, a dialog box appears. In the dialog box, you must specify whether to retain the mapping between the alert rule and the template.

If you select Retain the mapping between this alert rule and the alert rule template, your modifications may be overwritten when you apply the template to the Prometheus instance corresponding to the alert rule and select Update Created Alert Rules.
If you select Remove the mapping between this alert rule and the alert rule template, the alert rule is considered as a separate rule. We recommend that you rename the alert rule. Otherwise, if you apply the alert rule template to the Prometheus instance again, a new alert rule fails to be created due to name conflicts.