An alert rule monitors a specific application. When triggered, it generates an alert event and notifies specified recipients, such as alert contacts, chatbots, custom webhooks, or action integrations, ensuring you are promptly informed to take action.
Prerequisites
The required observability services, such as Managed Service for Prometheus, Application Real-Time Monitoring Service (ARMS), or Log Service, are enabled.
You have created notification objects.
Create an alert rule
Log in to the CloudMonitor 2.0 console. In the left-side navigation pane, choose .
On the Alert Center page, choose .
On the alert rules page, click Create alert rule.
In the Create alert rule panel, configure the parameters for the rule.
Rule name: A descriptive name for the alert rule to facilitate management.
Monitoring type: Select the type of service to monitor.
Managed Service for Prometheus / Cloud Synthetic Monitoring
Parameter
Description
Data source type
The data source of the selected monitoring service.
Region
The region of the data source.
Prometheus instance
The instance to monitor.
Condition definition method
Custom PromQL: Write a custom PromQL query. For more information, see PromQL function examples.
Configure based on predefined metrics:
Metric group: Select a metric group.
Metric: Select a metric.
Detection condition: Set the detection condition by specifying a comparison operator and a threshold. p50, p75, p90, and p99 represent percentiles.
PromQL preview: Preview the PromQL statement for the predefined metric.
Severity
Set the severity level for the alert rule.
P1: Critical: For major issues that affect core service availability and have a widespread impact.
P2: Error: For issues that cause partial service failures and have a moderate impact on system availability.
P3: Warning: For issues that may lead to service errors or disruptions.
P4: Info: For low-priority events that require notification. This is the default level.
Duration
The period a condition must persist before an alert is triggered. This setting helps prevent false alarms from momentary fluctuations.
Alert check interval
The interval at which the alert rule is evaluated. The default is 60 seconds (once per minute).
Content
Customize the alert notification content by using Go template syntax. Example: Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} CPU usage is {{$labels.metrics_params_value}}%, current value is {{ printf "%.2f" $value }}%
Labels
User-defined key-value pairs for categorizing and filtering alert rules. Examples:
env: production,team: sre.Annotations
Additional information for the alert, such as descriptions or links to runbooks. Examples:
description: High CPU usage,runbook_url: https://wiki.xxx.com/runbook\.Application Monitoring
Parameter
Description
Data source type
The data source type of the selected monitoring service.
Region
The region of the data source.
Application
The application instance to monitor.
Metric group
Select an application metric group.
API name
Select a matching method, such as Any, Equals, Not Equals, Regex Match, Regex Not Match, or No Dimension.
API call type
Condition method
Single condition:
Set the time period (last
Nminutes), call type, aggregation method, and comparison operator.Set thresholds for different severity levels: Critical, Error, Warning, and Info.
Multiple conditions:
Multiple alert trigger rule: Choose whether to trigger the alert when Any condition is met or when All conditions are met.
Detection condition 1: Configure the parameters as described for a single condition.
Add detection condition: Add more detection conditions as needed.
Severity: Select a severity level: P1: Critical, P2: Error, P3: Warning, or P4: Info.
Alert check interval
The interval at which the alert rule is evaluated. The default is 60 seconds (once per minute).
Content
Customize the alert notification content.
Labels
User-defined key-value pairs for categorizing and filtering alert rules. Examples:
env: production,team: sre.Annotations
Additional information for the alert, such as descriptions or links to runbooks. Examples:
description: High CPU usage,runbook_url: https://wiki.xxx.com/runbook\.Large Model Observability
Parameter
Description
Data source type
UModel
Entity type
Select the entity type to monitor.
Metric set
Select a metric set. Options include AI application operation metrics, GenAI model metrics, and AI application traffic metrics.
Detection condition
Set the threshold that triggers the alert.
Severity
Select a severity level: P1: Critical, P2: Error, P3: Warning, or P4: Info.
Duration
The period a condition must persist to trigger an alert.
Alert check interval
The interval at which the alert rule is evaluated. The default is 60 seconds (once per minute).
Content
Customize the alert notification content.
Labels
User-defined key-value pairs for categorizing and filtering alert rules. Examples:
env: production,team: sre.Annotations
Additional information for the alert, such as descriptions or links to runbooks. Examples:
description: High CPU usage,runbook_url: https://wiki.xxx.com/runbook\.Container Insights / ECS Insights / Hologres Insights / AI Training Service Insights / Database Insights
Parameter
Description
Data source type
The data source of the selected monitoring service.
Region
The region of the data source.
Prometheus instance
The instance to monitor.
Condition definition method
Custom PromQL: Write a custom PromQL query. For more information, see PromQL function examples.
Configure based on predefined metrics:
Metric group: Select a metric group.
Metric: Select a metric.
Detection condition: Set the detection condition by specifying a comparison operator and a threshold.
PromQL preview: Preview the PromQL statement for the predefined metric.
Severity
Set the severity level for the alert rule.
P1: Critical
P2: Error
P3: Warning
P4: Info
Duration
The period a condition must persist to trigger an alert.
Alert check interval
The interval at which the alert rule is evaluated. The default is 60 seconds (once per minute).
Evaluate after data is complete
Select an evaluation method.
Content
Customize the alert notification content by using Go template syntax. Example: Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} CPU usage is {{$labels.metrics_params_value}}%, current value is {{ printf "%.2f" $value }}%
Labels
User-defined key-value pairs for categorizing and filtering alert rules. Examples:
env: production,team: sre.Annotations
Additional information for the alert, such as descriptions or links to runbooks. Examples:
description: High CPU usage,runbook_url: https://wiki.xxx.com/runbook\.Log Audit
Parameter
Description
Select template
Action audit: Select an action audit template.
Host audit: Select a host audit template.
Container audit: Select a container audit template.
Query and statistics
Single query: Configure a single query based on log information.
Set operation: Configure set operations and add multiple resource groups.
Detection judgment
Add multiple conditions as needed, and set the data matching method and severity level.
Severity
Select a severity level: Critical, Error, Warning, or Info.
Consecutive occurrences
The number of consecutive occurrences required to trigger an alert.
Alert check interval
The interval at which the alert rule is evaluated. The default is 60 seconds (once per minute).
Labels
User-defined key-value pairs for categorizing and filtering alert rules. Examples:
env: production,team: sre.Annotations
Additional information for the alert, such as descriptions or links to runbooks. Examples:
description: High CPU usage,runbook_url: https://wiki.xxx.com/runbook\.Log Service: For Log Service, the parameters are the same as those for Log Audit.
Configure Alert notifications.
Notification object: Select one or more notification objects to receive alert information when the rule is triggered.
Alert contact: Specify individual users to receive alert notifications.
Contact group: Specify a group of contacts to receive alert notifications.
DingTalk: Send alerts via a DingTalk chatbot.
WeCom: Send alerts via a WeCom chatbot.
Lark: Send alerts via a Lark chatbot.
Slack: Send alerts through Slack.
Custom webhook: Send alerts via a custom HTTP callback.
Integrate with ARMS Alert Management: Select whether to integrate with the ARMS alert operations center.
NoteBy default, alert events are sent to the ARMS alert operations center. To configure alert notifications, go to the ARMS alert operations center.
Action Integration: Select a specific cloud service or third-party service to handle post-alert tasks. Examples include Log Service, a lightweight message queue, Function Compute, and third-party services such as PagerDuty or a webhook.
Alert resend interval: The interval for resending notifications for an unresolved alert. You can select 1, 5, 10, 15, or 30 minutes, or 1, 3, 6, 12, or 24 hours.
NoteFor example, if you set the alert resend interval to 12 hours and an alert is not resolved, CloudMonitor resends the alert notification after 12 hours.
Effective time: The time range during which the alert rule is active. Notifications are sent only during this period.
NoteOutside its effective period, the rule does not send notifications, but still records triggered alerts in the alert history.
The time range can span midnight, for example, 23:00 to 01:00 the next day.
Manage alert rules
The alert rules page lists all created alert rules and the following information:
Field
Description
Alert status
The current status of the rule. Values include:
- No Alert: The monitored data is normal and has not met the alert conditions.
- Alerting: The monitored data has met the alert conditions, and an alert is in progress.
- No Data: No monitoring data was retrieved for the rule.
Rule Name/ID
The display name and unique identifier (UUID) of the alert rule.
Enabled status
Indicates whether the rule is enabled. Enabled rules are evaluated at the configured interval. Disabled rules are not evaluated.
Source service
The service associated with the alert rule.
You can search for alert rules by using the following criteria:
Monitoring type
Rule Name/ID
Alert status
Enabled status
More Filters: You can search by using Add label and Add notification object.
To edit a rule, find it in the list and click Edit in the Actions column. After making your changes, click OK.
To enable or disable a rule, use the toggle switch in the Enabled status column for that rule.
To delete a rule, find it and click the
Delete icon in its Actions column.WarningThis action cannot be undone. Proceed with caution.