Prometheus Service (Prometheus) provides out-of-the-box alert rules. You can also customize alert rules for specific monitored objects. If a rule is triggered, Prometheus sends alert notifications to the specified contact group by using the specified notification method. This way, contacts can resolve issues at the earliest opportunity.

Prerequisites

Procedure

  1. Log on to the ARMS console.
  2. In the left-side navigation pane, click Prometheus Monitoring.
  3. In the top navigation bar of the Prometheus Monitoring page, select the region where the monitored Kubernetes cluster resides. Then, click the name of the Kubernetes cluster.
  4. In the left-side navigation pane, click Alarm configuration.
  5. On the Alarm configuration page, click Create Alert in the upper-right corner.
  6. In the Create Alert panel, set the parameters.
    1. Optional:Select a template from the Alarm template drop-down list.
    2. Enter a rule name in the Rule Name field. Example: Alert for inbound traffic.
    3. Enter a PromQL statement as the expression in the Alarm expression field. Example: (sum(rate(kube_state_metrics_list_total{job="kube-state-metrics",result="error"}[5m])) / sum(rate(kube_state_metrics_list_total{job="kube-state-metrics"}[5m]))) > 0.01.
      Notice If a PromQL statement contains a dollar sign ( $), an error is returned. You must delete the equal sign ( =) and the parameters on both sides of the equal sign ( =) from the statement that contains the dollar sign ($). For example, change sum (rate (container_network_receive_bytes_total{instance=~"^$HostIp.*"}[1m])) to sum (rate (container_network_receive_bytes_total[1m])).
    4. Enter a number N in the duration field. After the alert rule is created, an alert notification is sent only when the alert condition is met for N consecutive minutes. For example, you can enter 1. In this case, an alert notification is sent only when the alert condition is met for one consecutive minute.
      Note The alert condition refers to the condition specified by the PromQL statement. By default, Prometheus collects data at intervals of 15 seconds. If less than 4N consecutively collected data records meet the alert condition, no alert notification is sent. The threshold 4N is calculated by using the following formula: N × 60/15 = 4N. In this example, N is set to 1. Therefore, an alert notification is sent only when four consecutively collected data records meet the alert condition. To ensure that an alert notification can be sent each time a collected data record meets the alert condition, set N to 0.
    5. Enter the notification content in the Alarm message field.
    6. Optional:In the Labels section of Advanced Configuration, click Create Tag to add one or more tags to the alert rule. The specified tags can be used as options for a notification rule.
    7. Optional:In the Annotations section of Advanced Configuration, click Create Annotation. Then, enter message in the Key field and {{variable name}} alert message in the Value field. The specified annotation is in the format of message:{{variable name}} alert message. Example: message:{{$labels.pod_name}} restart.

      You can customize a variable name or select an existing tag as the variable name. The following content describes the existing tags:

      • The tags that are carried in the metrics of an alert rule expression.
      • The tags that are created when you create an alert rule. For more information, see Create an alert rule.
      • The default tags provided by ARMS. The following table describes the default tags.
        Tag Description
        alertname The name of the alert. The format is <Alert name>_<Cluster name>.
        _aliyun_arms_alert_level The level of the alert.
        _aliyun_arms_alert_type The type of the alert.
        _aliyun_arms_alert_rule_id The ID of the alert rule.
        _aliyun_arms_region_id The ID of the region.
        _aliyun_arms_userid The ID of the user.
        _aliyun_arms_involvedObject_type The subtype of the associated object, for example, ManagedKubernetes or ServerlessKubernetes.
        _aliyun_arms_involvedObject_kind The type of the associated object, for example, app or cluster.
        _aliyun_arms_involvedObject_id The ID of the associated object.
        _aliyun_arms_involvedObject_name The name of the associated object.
    8. Click OK.
    After you create the alert rule, the Alarm configuration page displays the rule, as shown in the following figure.