Managed Service for Prometheus allows you to create an alert rule for a specified Prometheus instance. When the specified condition in the alert rule is met, an alert event is generated. If you want to receive an alert notification, you can configure a notification policy. Alerts are generated based on notification policies and alert notifications can be sent by using text messages, emails, phone calls, DingTalk chatbots, WeCom chatbots, and webhooks.

Prerequisites

A Prometheus instance is created. For more information, see the following topics:

Procedure

  1. Log on to the Prometheus console.
  2. In the left-side navigation pane, click View Alert rules.
  3. In the upper-right corner of the Prometheus Alert Rules page, click Create Prometheus Alert Rule.

Use a preset metric to create an alert rule

If you set Check Type to Static Threshold, you can select a preset metric and create an alert rule by using the metric.

  1. On the Create Prometheus Alert Rule page, configure the parameters. The following table describes the parameters.
    ParameterDescriptionExample
    Alert NameEnter a name for the alert rule. Production cluster - container CPU utilization alert
    Check TypeSelect Static Threshold. Static Threshold
    Prometheus InstanceSelect the Prometheus instance for which you want to create the alert rule. Production cluster
    Alert Contact GroupSelect an alert contact group.

    The alert groups that are supported by a Prometheus instance vary based on the type of the Prometheus instance. The options in the drop-down list vary based on the type of the Prometheus instance that you specify.

    Kubernetes load
    Alert MetricSelect the metric that you want to monitor by using the alert rule. Different alert groups provide different metrics. Container CPU utilization
    Alert ConditionSpecify the condition based on which alert events are generated. If the CPU utilization of the container is greater than 80%, an alert event is generated.
    Filter ConditionSpecify the applicable scope of the alert rule. If a resource meets the filter condition and the alert condition, an alert event is generated.
    The following types of filter conditions are supported:
    • Traverse: The alert rule applies to all resources of the current Prometheus instance. The default value of the Filter Condition parameter is Traverse.
    • Equal: If you select this filter condition, you must enter a resource name. The alert rule applies only to the specified resource. You cannot specify multiple resources at the same time.
    • Not equal: If you select this filter condition, you must enter a resource name. The alert rule applies to resources other than the specified resource. You cannot specify multiple resources at the same time.
    • Regular match: If you select this filter condition, you must enter a regular expression to match resource names. The alert rule applies to all resources that match the regular expression.
    • Regular not match: If you select this filter condition, you must enter a regular expression to match resource names. The alert rule applies to resources that do not match the regular expression.
    Note After you specify the filter condition, the Data Preview section appears.
    Traverse
    Data PreviewThe Data Preview section displays the PromQL statement that corresponds to the alert condition. The section also displays the values of the specified metric in a time series graph.

    By default, only the real-time values of one resource are displayed. You can specify filter conditions to view the metric values of different resources in different time ranges.

    Note
    • The threshold in the time series graph is represented by a red line. The part of the curve that meets the alert condition is displayed in dark red, and the part of the curve that does not meet the alert condition is displayed in blue.
    • You can move the pointer over the curve to view resource details at a specific point in time.
    • You can also select a time period on the time series curve to view the time series curve of the selected time period.
    N/A
    Duration
    • If the alert condition is met, an alert event is generated: If a data point reaches the threshold, an alert event is generated.
    • If the alert condition is continuously met for N minutes, an alert event is generated: An alert event is generated only if the duration for which the threshold is reached is greater than or equal to N minutes.
    1
    Alert LevelSpecify the severity level of the alert. Valid values: Default, P4, P3, P2, and P1. Default value: Default. The preceding values are listed in ascending order of severity. Default
    Alert MessageSpecify the alert message that you want to send to end users. You can specify custom variables in the alert message based on the Go template syntax. Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} CPU utilization {{$labels.metrics_params_opt_label_value}} {{$labels.metrics_params_value}}%, Current value {{ printf "%.2f" $value }}%
    Advanced Settings
    Alert Check CycleAn alert rule is triggered every N minutes to check whether the alert conditions are met. Default value: 1 minute. Minimum value: 1 minute. 1
    Specify Notification Policies
    • Do Not Specify Notification Rules: If you select this option, you can create a notification policy on the Notification policy page after you create the alert rule. You can specify match rules and match conditions. For example, you can specify an alert rule name as the match condition. When the alert rule is triggered, an alert event is generated and an alert notification is sent to the contacts or contact groups that are specified in the notification policy. For more information, see Create and manage a notification policy.
    • You can also select a notification policy from the drop-down list. Application Real-Time Monitoring Service (ARMS) automatically adds a match rule to the selected notification policy and specifies the ID of the alert rule as the match condition. The name of the alert rule is displayed on the Notification Policies page. This way, the alert events that are generated based on the alert rule can be matched by the selected notification policy.
    Important After you select a notification policy, the alert events that are generated based on the alert rule can be matched by the notification policy and alerts can be generated. The alert events may also be matched by other notification policies that use fuzzy match, and alerts may be generated. One or more alert events can be matched by one or more notification policies.
    Do Not Specify Notification Rules
    TagsSpecify tags for the alert rule. The specified tags can be used to match notification policies. N/A
    AnnotationsSpecify annotations for the alert rule. N/A
  2. Click Save. On the Prometheus Alert Rules page, you can check the status of the alert rule.
    If Automatic Interruption appears in the Status column, modify the alert rule based on the prompted cause, click Start in the Actions column, and then click OK in the Confirm message. If the issue persists after you apply the preceding solution, contact technical support (d9j_rg9e4062f).

Use a custom PromQL statement to create an alert rule

To monitor a metric other than the preset metrics, you can use a custom PromQL statement to create an alert rule.

  1. On the Create Prometheus Alert Rule page, configure the parameters. The following table describes the parameters.
    ParameterDescriptionExample
    Alert NameEnter a name for the alert rule. Pod CPU utilization exceeds 8%
    Check TypeSelect Custom PromQL. Custom PromQL
    Prometheus InstanceSelect the Prometheus instance for which you want to create the alert rule. N/A
    Reference MetricsOptional. The Reference Metrics drop-down list displays common metrics. After you select a metric, the PromQL statement of the metric is displayed in the Custom PromQL Statements field. You can modify the statement based on your business requirements.

    The values in the Reference Metrics drop-down list vary based on the type of the Prometheus instance.

    Pod disk usage alert
    Custom PromQL StatementsSpecify the PromQL statement based on which alert events are generated. max(container_fs_usage_bytes{pod!="", namespace!="arms-prom",namespace!="monitoring"}) by (pod_name, namespace, device)/max(container_fs_limit_bytes{pod!=""}) by (pod_name,namespace, device) * 100 > 90
    Data PreviewThe Data Preview section displays the time series graph of resources that meet the specified conditions in the PromQL statement.

    By default, the alert data of all resources that meet the specified conditions in the PromQL statement is displayed. You can configure filter conditions to display the data of a specific resource in a specific time range.

    Note
    • You can move the pointer over the curve to view resource details at a specific point in time.
    • You can also select a time period on the time series curve to view the time series curve of the selected time period.
    N/A
    Duration
    • If the alert condition is met, an alert event is generated: If a data point reaches the threshold, an alert event is generated.
    • If the alert condition is continuously met for N minutes, an alert event is generated: An alert event is generated only if the duration for which the threshold is reached is greater than or equal to N minutes.
    1
    Alert LevelSpecify the severity level of the alert. Valid values: Default, P4, P3, P2, and P1. Default value: Default. The preceding values are listed in ascending order of severity. Default
    Alert MessageSpecify the alert message that you want to send to the end users. You can specify custom variables in the alert message based on the Go template syntax. Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / The utilization of the {{$labels.device}} disk exceeds 90%. Current value: {{ printf "%.2f" $value }}%
    Advanced Settings
    Alert Check CycleAn alert rule is triggered every N minutes to check whether the alert conditions are met. Default value: 1 minute. Minimum value: 1 minute. 1
    Specify Notification Policies
    • Do Not Specify Notification Rules: If you select this option, you can create a notification policy on the Notification Policies page after you create the alert rule. On the Notification Policies page, you can specify match rules and match conditions. For example, you can specify an alert rule name as the match condition. When the alert rule is triggered, an alert event is generated and an alert notification is sent to the contacts or contact groups that are specified in the notification policy. For more information, see Create and manage a notification policy.
    • You can also select a notification policy from the drop-down list. ARMS automatically adds a match rule to the selected notification policy and specifies the ID of the alert rule as the match condition. The name of the alert rule is displayed on the Notification Policies page. This way, the alert events that are generated based on the alert rule can be matched by the selected notification policy.
    Important After you select a notification policy, the alert events that are generated based on the alert rule can be matched by the notification policy and alerts can be generated. The alert events may also be matched by other notification policies that use fuzzy match, and alerts may be generated. One or more alert events can be matched by one or more notification policies.
    Do Not Specify Notification Rules
    TagsSpecify tags for the alert rule. The specified tags can be used to match notification policies. N/A
    AnnotationsSpecify annotations for the alert rule. N/A
  2. Click Save. On the Prometheus Alert Rules page, you can check the status of the alert rule.
    If Automatic Interruption appears in the Status column, modify the alert rule based on the prompted cause, click Start in the Actions column, and then click OK in the Confirm message. If the issue persists after you apply the preceding solution, contact technical support (d9j_rg9e4062f).

Manage an alert rule

On the Prometheus Alert Rules page, you can enable, disable, modify, delete, or view the details of an alert rule.

  1. Optional: On the Prometheus Alert Rules page, configure filter conditions or enter an alert name in the search box, and then click the Search icon.
  2. Perform the following operations on an alert rule by using the options in the Actions column based on your business requirements:
    • To modify an alert rule, click Edit. On the Edit Prometheus Alert Rules page, modify the alert rule, and click Save.
    • To delete an alert rule, click Delete. In the Confirm message, click OK.
    • To enable an alert rule, click Start. In the Confirm message, click OK.
    • To disable an alert rule, click Stop. In the Confirm message, click OK.
    • To view historical alert events, click Alert Event History. On the Events page, you can view historical alert events.