Alert rules - Cloud Monitor - Alibaba Cloud Documentation Center

An alert rule monitors a specific application. When triggered, it generates an alert event and notifies specified recipients, such as alert contacts, chatbots, custom webhooks, or action integrations, ensuring you are promptly informed to take action.

Prerequisites

The required observability services, such as Managed Service for Prometheus, Application Real-Time Monitoring Service (ARMS), or Log Service, are enabled.
You have created notification objects.

Create an alert rule

Log in to the CloudMonitor 2.0 console. In the left-side navigation pane, choose All Features > Alert Center.
On the Alert Center page, choose alert management > alert rules.
On the alert rules page, click Create alert rule.

In the Create alert rule panel, configure the parameters for the rule.

Rule name: A descriptive name for the alert rule to facilitate management.

Monitoring type: Select the type of service to monitor.

Managed Service for Prometheus / Cloud Synthetic Monitoring

Parameter	Description
Data source type	The data source of the selected monitoring service.
Region	The region of the data source.
Prometheus instance	The instance to monitor.
Condition definition method	Custom PromQL: Write a custom PromQL query. For more information, see PromQL function examples.
Condition definition method	Configure based on predefined metrics: Metric group: Select a metric group. Metric: Select a metric. Detection condition: Set the detection condition by specifying a comparison operator and a threshold. p50, p75, p90, and p99 represent percentiles. PromQL preview: Preview the PromQL statement for the predefined metric.
Severity	Set the severity level for the alert rule. P1: Critical: For major issues that affect core service availability and have a widespread impact. P2: Error: For issues that cause partial service failures and have a moderate impact on system availability. P3: Warning: For issues that may lead to service errors or disruptions. P4: Info: For low-priority events that require notification. This is the default level.
Duration	The period a condition must persist before an alert is triggered. This setting helps prevent false alarms from momentary fluctuations.
Alert check interval	The interval at which the alert rule is evaluated. The default is 60 seconds (once per minute).
Content	Customize the alert notification content by using Go template syntax. Example: `Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} CPU usage is {{$labels.metrics_params_value}}%, current value is {{ printf "%.2f" $value }}%`
Labels	User-defined key-value pairs for categorizing and filtering alert rules. Examples: `env: production`, `team: sre`.
Annotations	Additional information for the alert, such as descriptions or links to runbooks. Examples: `description: High CPU usage`, `runbook_url: https://wiki.xxx.com/runbook\`.

Application Monitoring

Parameter	Description
Data source type	The data source type of the selected monitoring service.
Region	The region of the data source.
Application	The application instance to monitor.
Metric group	Select an application metric group.
API name	Select a matching method, such as Any, Equals, Not Equals, Regex Match, Regex Not Match, or No Dimension.
API call type
Condition method	Single condition: Set the time period (last `N` minutes), call type, aggregation method, and comparison operator. Set thresholds for different severity levels: Critical, Error, Warning, and Info. Multiple conditions: Multiple alert trigger rule: Choose whether to trigger the alert when Any condition is met or when All conditions are met. Detection condition 1: Configure the parameters as described for a single condition. Add detection condition: Add more detection conditions as needed. Severity: Select a severity level: P1: Critical, P2: Error, P3: Warning, or P4: Info.
Alert check interval	The interval at which the alert rule is evaluated. The default is 60 seconds (once per minute).
Content	Customize the alert notification content.
Labels	User-defined key-value pairs for categorizing and filtering alert rules. Examples: `env: production`, `team: sre`.
Annotations	Additional information for the alert, such as descriptions or links to runbooks. Examples: `description: High CPU usage`, `runbook_url: https://wiki.xxx.com/runbook\`.

Large Model Observability

Parameter	Description
Data source type	UModel
Entity type	Select the entity type to monitor.
Metric set	Select a metric set. Options include AI application operation metrics, GenAI model metrics, and AI application traffic metrics.
Detection condition	Set the threshold that triggers the alert.
Severity	Select a severity level: P1: Critical, P2: Error, P3: Warning, or P4: Info.
Duration	The period a condition must persist to trigger an alert.
Alert check interval	The interval at which the alert rule is evaluated. The default is 60 seconds (once per minute).
Content	Customize the alert notification content.
Labels	User-defined key-value pairs for categorizing and filtering alert rules. Examples: `env: production`, `team: sre`.
Annotations	Additional information for the alert, such as descriptions or links to runbooks. Examples: `description: High CPU usage`, `runbook_url: https://wiki.xxx.com/runbook\`.

Container Insights / ECS Insights / Hologres Insights / AI Training Service Insights / Database Insights

Parameter	Description
Data source type	The data source of the selected monitoring service.
Region	The region of the data source.
Prometheus instance	The instance to monitor.
Condition definition method	Custom PromQL: Write a custom PromQL query. For more information, see PromQL function examples.
Condition definition method	Configure based on predefined metrics: Metric group: Select a metric group. Metric: Select a metric. Detection condition: Set the detection condition by specifying a comparison operator and a threshold. PromQL preview: Preview the PromQL statement for the predefined metric.
Severity	Set the severity level for the alert rule. P1: Critical P2: Error P3: Warning P4: Info
Duration	The period a condition must persist to trigger an alert.
Alert check interval	The interval at which the alert rule is evaluated. The default is 60 seconds (once per minute).
Evaluate after data is complete	Select an evaluation method.
Content	Customize the alert notification content by using Go template syntax. Example: `Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} CPU usage is {{$labels.metrics_params_value}}%, current value is {{ printf "%.2f" $value }}%`
Labels	User-defined key-value pairs for categorizing and filtering alert rules. Examples: `env: production`, `team: sre`.
Annotations	Additional information for the alert, such as descriptions or links to runbooks. Examples: `description: High CPU usage`, `runbook_url: https://wiki.xxx.com/runbook\`.

Log Audit

Parameter	Description
Select template	Action audit: Select an action audit template. Host audit: Select a host audit template. Container audit: Select a container audit template.
Query and statistics	Single query: Configure a single query based on log information. Set operation: Configure set operations and add multiple resource groups.
Detection judgment	Add multiple conditions as needed, and set the data matching method and severity level.
Severity	Select a severity level: Critical, Error, Warning, or Info.
Consecutive occurrences	The number of consecutive occurrences required to trigger an alert.
Alert check interval	The interval at which the alert rule is evaluated. The default is 60 seconds (once per minute).
Labels	User-defined key-value pairs for categorizing and filtering alert rules. Examples: `env: production`, `team: sre`.
Annotations	Additional information for the alert, such as descriptions or links to runbooks. Examples: `description: High CPU usage`, `runbook_url: https://wiki.xxx.com/runbook\`.

Log Service: For Log Service, the parameters are the same as those for Log Audit.

Configure Alert notifications.
- Notification object: Select one or more notification objects to receive alert information when the rule is triggered.
  - Alert contact: Specify individual users to receive alert notifications.
  - Contact group: Specify a group of contacts to receive alert notifications.
  - DingTalk: Send alerts via a DingTalk chatbot.
  - WeCom: Send alerts via a WeCom chatbot.
  - Lark: Send alerts via a Lark chatbot.
  - Slack: Send alerts through Slack.
  - Custom webhook: Send alerts via a custom HTTP callback.
- Integrate with ARMS Alert Management: Select whether to integrate with the ARMS alert operations center.
  Note
  By default, alert events are sent to the ARMS alert operations center. To configure alert notifications, go to the ARMS alert operations center.
- Action Integration: Select a specific cloud service or third-party service to handle post-alert tasks. Examples include Log Service, a lightweight message queue, Function Compute, and third-party services such as PagerDuty or a webhook.
- Alert resend interval: The interval for resending notifications for an unresolved alert. You can select 1, 5, 10, 15, or 30 minutes, or 1, 3, 6, 12, or 24 hours.
  Note
  For example, if you set the alert resend interval to 12 hours and an alert is not resolved, CloudMonitor resends the alert notification after 12 hours.
- Effective time: The time range during which the alert rule is active. Notifications are sent only during this period.
  Note
  - Outside its effective period, the rule does not send notifications, but still records triggered alerts in the alert history.
  - The time range can span midnight, for example, 23:00 to 01:00 the next day.

Manage alert rules

The alert rules page lists all created alert rules and the following information:

Field	Description
Alert status	The current status of the rule. Values include: - No Alert: The monitored data is normal and has not met the alert conditions. - Alerting: The monitored data has met the alert conditions, and an alert is in progress. - No Data: No monitoring data was retrieved for the rule.
Rule Name/ID	The display name and unique identifier (UUID) of the alert rule.
Enabled status	Indicates whether the rule is enabled. Enabled rules are evaluated at the configured interval. Disabled rules are not evaluated.
Source service	The service associated with the alert rule.

You can search for alert rules by using the following criteria:
- Monitoring type
- Rule Name/ID
- Alert status
- Enabled status
- More Filters: You can search by using Add label and Add notification object.

To edit a rule, find it in the list and click Edit in the Actions column. After making your changes, click OK.
To enable or disable a rule, use the toggle switch in the Enabled status column for that rule.
To delete a rule, find it and click the Delete icon in its Actions column.
Warning
This action cannot be undone. Proceed with caution.