When you monitor services across multiple Alibaba Cloud products and third-party tools, alerts scatter across systems, making it difficult to track, prioritize, and resolve issues efficiently. Application Real-Time Monitoring Service (ARMS) Alert Management centralizes alert convergence, notification routing, and escalation into a single control plane. It deduplicates and compresses alerts to reduce alert storms, routes notifications to the right contacts, and enables teams to resolve incidents collaboratively.
How it works
Alert sources report events. ARMS sub-services (Application Monitoring, Browser Monitoring, Managed Service for Prometheus, Synthetic Monitoring) and third-party monitoring tools send alert events to Alert Management through integrations.
Events are processed. Alert Management deduplicates, compresses, denoises, and silences events to reduce alert storms. Event processing flows provide custom handling logic for specific alert sources.
Notifications reach the right people. Notification policies route processed events to contacts through email, SMS, phone calls, or messaging platforms (DingTalk, WeCom, Lark).
Teams resolve alerts together. Contacts claim, discuss, and resolve alerts in the ARMS console or group chats. Unresolved alerts escalate automatically.
Analytics track resolution performance. Real-time statistics show how alerts are handled, helping your team identify bottlenecks and improve response times.
Architecture
Alert Management consists of five modules:
| Module | Purpose |
|---|---|
| Integration management | Connect ARMS sub-services and third-party alert sources |
| Alert event management | Deduplicate, compress, denoise, and silence incoming events |
| Notification policy management | Route alerts to contacts based on matching conditions |
| Collaborative alert handling | Enable teams to claim, discuss, and resolve alerts across platforms |
| Alert handling analysis | Track alert resolution metrics and team performance |
Integration management
Alert Management supports two integration types: default alert integrations for ARMS sub-services and third-party service integrations for external alert sources.
Default alert integrations
Default alert integrations connect Alert Management with ARMS sub-services. These integrations periodically check whether monitoring data contains errors and report matching alert events to Event Management Center.
Create alert rules for each sub-service:
Third-party service integrations
Third-party service integrations funnel alerts from self-managed data centers or virtual machines into ARMS. When a third-party source reports an alert, Alert Management generates an alert event.
Alert event data structure
The ARMS alert event data structure is similar to the open-source AlertManager notification template format and contains the following fields:
| Field | Description | Example |
|---|---|---|
| Labels | Metadata that uniquely identifies an alert event. Events with identical labels are compressed into one. | "alertname": "CPU utilization is too high" |
| Annotations | Supplementary information that does not affect event identity. | "message": "alert content" |
| StartsAt | Start time of the alert. | -- |
| EndsAt | End time of the alert. | -- |
| GeneratorUrl | URL linking to the alert event source. | -- |
How labels and annotations differ
Labels define identity. A set of labels uniquely identifies an alert event. Changing any label creates a new event.
For example, the following labels identify a CPU alert for a specific host:
{
"hostname": "Host",
"alertname": "CPU utilization is too high",
"ip": "192.168.0.3"
}If ip changes to 192.168.0.4, Alert Management treats this as a separate alert event for a different host.
Annotations carry context. Annotation changes do not create new events. If events share the same labels but have different annotations, Alert Management treats them as repeated reports of the same alert.
For example, if the annotation {"value": "85", "message": "CPU utilization of host 192.168.0.3 is 85%, exceeding the 80% threshold"} later reports {"value": "86", ...}, no new event is created. Alert Management records this as the same alert reported twice.
Configure deduplication fields as labels for an integration to control how Alert Management identifies unique events. Without deduplication fields, Alert Management uses all labels to determine uniqueness.
Alert event management
The alert event management module processes incoming events in two ways:
Event processing flows orchestrate custom procedures to handle events from specific alert sources, providing fine-grained control over event routing and transformation.
Built-in noise reduction deduplicates, compresses, denoises, and silences events automatically, converging related alerts and reducing alert storms.
Event compression
Alert Management compresses events using two methods: label-based compression and time-based compression.
Label-based compression
When sending notifications, Alert Management groups events according to the event grouping settings in your notification policy. Events that share the same label values are compressed into a single event.
Time-based compression
For events with identical labels, if their time ranges (StartsAt to EndsAt) overlap, Alert Management merges them into a single event. The merged event spans the union of all original time ranges.
Notification policy management
Notification policies define conditions -- similar to subscription rules -- that determine how alert notifications are delivered. When an alert event matches the conditions in a policy, ARMS sends notifications through the channels and to the contacts specified in that policy.
The following diagram shows how event processing flows, events, and notification policies interact.
Collaborative alert handling
Alert Management supports collaborative workflows across the ARMS console, DingTalk, WeCom, and Lark. Collaboration policies enable:
Group message synchronization -- Alert updates appear in team chat groups automatically.
Scheduling management -- Assign on-call rotations so the right person responds at any time.
Escalation policies -- Automatically notify additional contacts when alerts remain unresolved.
For step-by-step instructions, see Handle alerts in the specified group chat.
Benefits
For services deployed on Alibaba Cloud, Alert Management improves O&M efficiency in the following areas:
| Area | Capabilities |
|---|---|
| Global alert configuration | Globalize alert rule templates to configure alerting for global events. Globalize contacts and notification policies with simple configuration. |
| Centralized event management | Integrate alerts from Alibaba Cloud monitoring services and third-party tools into a single management layer. Handle alert events around the clock with stable, low-latency event processing. |
| Flexible notification delivery | Compress alert events through notification policies to reduce notification volume. Select one or more notification methods based on alert urgency, such as email, SMS, phone calls, or messaging platforms. Escalate unresolved alerts automatically by sending repeated notifications to additional contacts. |
| Team-based alert resolution | Claim and resolve alerts in DingTalk, WeCom, or Lark group chats without switching to the console. Standardize alert formats so every team member can quickly parse and act on incoming alerts. Collaborate with multiple contacts in real time through shared chat groups. |
| Real-time analytics | Track alert handling metrics in real time to identify response bottlenecks and optimize team workflows. |
Phone-based alert notifications are not available on the International site.