All Products
Search
Document Center

Cloud Monitor:Alert rules

Last Updated:Mar 27, 2026

An alert rule monitors a specific application. When triggered, it generates an alert event and notifies specified recipients, such as alert contacts, chatbots, custom webhooks, or action integrations, ensuring you are promptly informed to take action.

Prerequisites

  • The required observability services, such as Managed Service for Prometheus, Application Real-Time Monitoring Service (ARMS), or Log Service, are enabled.

  • You have created notification objects.

Create an alert rule

  1. Log in to the CloudMonitor 2.0 console. In the left-side navigation pane, choose All Features > Alert Center.

  2. On the Alert Center page, choose alert management > alert rules.

  3. On the alert rules page, click Create alert rule.

  4. In the Create alert rule panel, configure the parameters for the rule.

    1. Rule name: A descriptive name for the alert rule to facilitate management.

    2. Monitoring type: Select the type of service to monitor.

      • Managed Service for Prometheus / Cloud Synthetic Monitoring

        Parameter

        Description

        Data source type

        The data source of the selected monitoring service.

        Region

        The region of the data source.

        Prometheus instance

        The instance to monitor.

        Condition definition method

        Custom PromQL: Write a custom PromQL query. For more information, see PromQL function examples.

        Configure based on predefined metrics:

        • Metric group: Select a metric group.

        • Metric: Select a metric.

        • Detection condition: Set the detection condition by specifying a comparison operator and a threshold. p50, p75, p90, and p99 represent percentiles.

        • PromQL preview: Preview the PromQL statement for the predefined metric.

        Severity

        Set the severity level for the alert rule.

        • P1: Critical: For major issues that affect core service availability and have a widespread impact.

        • P2: Error: For issues that cause partial service failures and have a moderate impact on system availability.

        • P3: Warning: For issues that may lead to service errors or disruptions.

        • P4: Info: For low-priority events that require notification. This is the default level.

        Duration

        The period a condition must persist before an alert is triggered. This setting helps prevent false alarms from momentary fluctuations.

        Alert check interval

        The interval at which the alert rule is evaluated. The default is 60 seconds (once per minute).

        Content

        Customize the alert notification content by using Go template syntax. Example: Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} CPU usage is {{$labels.metrics_params_value}}%, current value is {{ printf "%.2f" $value }}%

        Labels

        User-defined key-value pairs for categorizing and filtering alert rules. Examples: env: production, team: sre.

        Annotations

        Additional information for the alert, such as descriptions or links to runbooks. Examples: description: High CPU usage, runbook_url: https://wiki.xxx.com/runbook\.

      • Application Monitoring

        Parameter

        Description

        Data source type

        The data source type of the selected monitoring service.

        Region

        The region of the data source.

        Application

        The application instance to monitor.

        Metric group

        Select an application metric group.

        API name

        Select a matching method, such as Any, Equals, Not Equals, Regex Match, Regex Not Match, or No Dimension.

        API call type

        Condition method

        Single condition:

        • Set the time period (last N minutes), call type, aggregation method, and comparison operator.

        • Set thresholds for different severity levels: Critical, Error, Warning, and Info.

        Multiple conditions:

        • Multiple alert trigger rule: Choose whether to trigger the alert when Any condition is met or when All conditions are met.

        • Detection condition 1: Configure the parameters as described for a single condition.

        • Add detection condition: Add more detection conditions as needed.

        • Severity: Select a severity level: P1: Critical, P2: Error, P3: Warning, or P4: Info.

        Alert check interval

        The interval at which the alert rule is evaluated. The default is 60 seconds (once per minute).

        Content

        Customize the alert notification content.

        Labels

        User-defined key-value pairs for categorizing and filtering alert rules. Examples: env: production, team: sre.

        Annotations

        Additional information for the alert, such as descriptions or links to runbooks. Examples: description: High CPU usage, runbook_url: https://wiki.xxx.com/runbook\.

      • Large Model Observability

        Parameter

        Description

        Data source type

        UModel

        Entity type

        Select the entity type to monitor.

        Metric set

        Select a metric set. Options include AI application operation metrics, GenAI model metrics, and AI application traffic metrics.

        Detection condition

        Set the threshold that triggers the alert.

        Severity

        Select a severity level: P1: Critical, P2: Error, P3: Warning, or P4: Info.

        Duration

        The period a condition must persist to trigger an alert.

        Alert check interval

        The interval at which the alert rule is evaluated. The default is 60 seconds (once per minute).

        Content

        Customize the alert notification content.

        Labels

        User-defined key-value pairs for categorizing and filtering alert rules. Examples: env: production, team: sre.

        Annotations

        Additional information for the alert, such as descriptions or links to runbooks. Examples: description: High CPU usage, runbook_url: https://wiki.xxx.com/runbook\.

      • Container Insights / ECS Insights / Hologres Insights / AI Training Service Insights / Database Insights

        Parameter

        Description

        Data source type

        The data source of the selected monitoring service.

        Region

        The region of the data source.

        Prometheus instance

        The instance to monitor.

        Condition definition method

        Custom PromQL: Write a custom PromQL query. For more information, see PromQL function examples.

        Configure based on predefined metrics:

        • Metric group: Select a metric group.

        • Metric: Select a metric.

        • Detection condition: Set the detection condition by specifying a comparison operator and a threshold.

        • PromQL preview: Preview the PromQL statement for the predefined metric.

        Severity

        Set the severity level for the alert rule.

        • P1: Critical

        • P2: Error

        • P3: Warning

        • P4: Info

        Duration

        The period a condition must persist to trigger an alert.

        Alert check interval

        The interval at which the alert rule is evaluated. The default is 60 seconds (once per minute).

        Evaluate after data is complete

        Select an evaluation method.

        Content

        Customize the alert notification content by using Go template syntax. Example: Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} CPU usage is {{$labels.metrics_params_value}}%, current value is {{ printf "%.2f" $value }}%

        Labels

        User-defined key-value pairs for categorizing and filtering alert rules. Examples: env: production, team: sre.

        Annotations

        Additional information for the alert, such as descriptions or links to runbooks. Examples: description: High CPU usage, runbook_url: https://wiki.xxx.com/runbook\.

      • Log Audit

        Parameter

        Description

        Select template

        Action audit: Select an action audit template.

        Host audit: Select a host audit template.

        Container audit: Select a container audit template.

        Query and statistics

        Single query: Configure a single query based on log information.

        Set operation: Configure set operations and add multiple resource groups.

        Detection judgment

        Add multiple conditions as needed, and set the data matching method and severity level.

        Severity

        Select a severity level: Critical, Error, Warning, or Info.

        Consecutive occurrences

        The number of consecutive occurrences required to trigger an alert.

        Alert check interval

        The interval at which the alert rule is evaluated. The default is 60 seconds (once per minute).

        Labels

        User-defined key-value pairs for categorizing and filtering alert rules. Examples: env: production, team: sre.

        Annotations

        Additional information for the alert, such as descriptions or links to runbooks. Examples: description: High CPU usage, runbook_url: https://wiki.xxx.com/runbook\.

      • Log Service: For Log Service, the parameters are the same as those for Log Audit.

    3. Configure Alert notifications.

      • Notification object: Select one or more notification objects to receive alert information when the rule is triggered.

        • Alert contact: Specify individual users to receive alert notifications.

        • Contact group: Specify a group of contacts to receive alert notifications.

        • DingTalk: Send alerts via a DingTalk chatbot.

        • WeCom: Send alerts via a WeCom chatbot.

        • Lark: Send alerts via a Lark chatbot.

        • Slack: Send alerts through Slack.

        • Custom webhook: Send alerts via a custom HTTP callback.

      • Integrate with ARMS Alert Management: Select whether to integrate with the ARMS alert operations center.

        Note

        By default, alert events are sent to the ARMS alert operations center. To configure alert notifications, go to the ARMS alert operations center.

      • Action Integration: Select a specific cloud service or third-party service to handle post-alert tasks. Examples include Log Service, a lightweight message queue, Function Compute, and third-party services such as PagerDuty or a webhook.

      • Alert resend interval: The interval for resending notifications for an unresolved alert. You can select 1, 5, 10, 15, or 30 minutes, or 1, 3, 6, 12, or 24 hours.

        Note

        For example, if you set the alert resend interval to 12 hours and an alert is not resolved, CloudMonitor resends the alert notification after 12 hours.

      • Effective time: The time range during which the alert rule is active. Notifications are sent only during this period.

        Note
        • Outside its effective period, the rule does not send notifications, but still records triggered alerts in the alert history.

        • The time range can span midnight, for example, 23:00 to 01:00 the next day.

Manage alert rules

  1. The alert rules page lists all created alert rules and the following information:

    Field

    Description

    Alert status

    The current status of the rule. Values include:

    - No Alert: The monitored data is normal and has not met the alert conditions.

    - Alerting: The monitored data has met the alert conditions, and an alert is in progress.

    - No Data: No monitoring data was retrieved for the rule.

    Rule Name/ID

    The display name and unique identifier (UUID) of the alert rule.

    Enabled status

    Indicates whether the rule is enabled. Enabled rules are evaluated at the configured interval. Disabled rules are not evaluated.

    Source service

    The service associated with the alert rule.

  2. You can search for alert rules by using the following criteria:

    • Monitoring type

    • Rule Name/ID

    • Alert status

    • Enabled status

    • More Filters: You can search by using Add label and Add notification object.

  • To edit a rule, find it in the list and click Edit in the Actions column. After making your changes, click OK.

  • To enable or disable a rule, use the toggle switch in the Enabled status column for that rule.

  • To delete a rule, find it and click the imageDelete icon in its Actions column.

    Warning

    This action cannot be undone. Proceed with caution.