ECS system events notify you of O&M task status, resource exceptions, and status changes, enabling prompt responses and automated O&M workflows.
This topic covers only ECS system events. For system events of other Alibaba Cloud products, see their respective documentation.
Use cases
-
Risk and exception notifications
Alibaba Cloud pushes system events to the ECS console to notify you of situations that affect resource availability and performance, such as system maintenance restarts and instance expirations. For critical events, Alibaba Cloud also sends email, and internal message notifications. Respond promptly in the ECS console or with OpenAPI to avoid business interruptions. See Query and respond to ECS system events.
For example, when a subscription instance is about to expire, the ECS console prompts you to renew it before it stops.
-
Automated O&M
System events in the ECS console have defined states that track O&M task execution. State changes sync with CloudMonitor, enabling automated O&M workflows. See States and windows of system events.
Note-
Each event state maps to a CloudMonitor event name. For example,
InstanceFailure.Rebootsupports theExecutingandExecutedstates, corresponding toInstance:InstanceFailure.Reboot:ExecutingandInstance:InstanceFailure.Reboot:Executed.
Some status change events, such as instance running status changes and spot instance interruptions, are not displayed in the ECS console and have no defined event states. However, they are still reported to CloudMonitor, enabling event-driven automated O&M.
For example, manually starting or stopping an instance generates a status change event. This is not a risk or exception, but you can set up event notifications to record such operations in your log system through callbacks.
-
System event types
System events are classified by trigger:
For supported event types and recommended actions, see Summary of ECS system events.
|
Category |
Description |
Displayed in the ECS console |
|
Scheduled O&M events |
Alibaba Cloud upgrades host software and proactively mitigates hardware and software failure risks. If an O&M task may affect your ECS resource availability or performance, Alibaba Cloud notifies you in advance with the execution time, affected resources, and impact. Respond during off-peak hours within the scheduled window to minimize business disruption. Note
Scheduled O&M events (also called proactive O&M events) use Alibaba Cloud's large-scale server management experience and Alibaba DAMO Academy machine learning algorithms to predict and avoid host failure risks. When a risk cannot be avoided, Alibaba Cloud notifies you in advance to allow service switchover. If you do not respond, your instance may go down or restart when the failure occurs. |
Yes Note
For big data instance families or instance families with local SSDs (excluding i4p), scheduled O&M events appear under Local Disk Instance Events. See O&M scenarios and system events for instances with local disks. |
|
Unexpected O&M events |
When an underlying host experiences a sudden hardware or software failure, or an instance encounters an issue such as an OOM error or kernel panic, the instance may restart or go down unexpectedly. Alibaba Cloud sends an unexpected O&M event, restores resource availability, and notifies you of the O&M task status. Note
Unexpected O&M events typically involve sudden instance downtime or restarts caused by unpredictable host failures or OS kernel errors.
|
Yes Note
For big data instance families or instance families with local SSDs (excluding i4p), unexpected O&M events appear under Local Disk Instance Events. See O&M scenarios and system events for instances with local disks. |
|
Local disk instance events |
Events for local disks (such as disk corruption) and for instances with local disks (such as instances affected by local disk damage or underlying host failures). Note
Local Disk Instance Events is not a specific event type. It groups scheduled and unexpected O&M events for big data instance families or instance families with local SSDs (excluding i4p). See O&M scenarios and system events for instances with local disks. |
Yes |
|
Performance limited events for burstable instances |
Reminder events indicating a burstable instance has exhausted its CPU credits. The CPU runs at baseline performance, which may cause slow access and degraded application performance. |
Yes |
|
Instance security events |
Events that affect instance security, such as a DDoS attack or blackhole that threatens the instance. |
Yes |
|
Instance migration events due to underlying layer upgrades |
When Alibaba Cloud upgrades physical infrastructure, instances in the affected regions and zones may need migration. Follow the system event guidance to migrate your instances. |
Yes |
|
Status change events |
Events triggered by instance lifecycle operations (such as manual start/stop) or property changes detected by Alibaba Cloud:
|
|
System event severities
System events have the following severity levels:
-
Critical: Major impact requiring immediate action. Without action, the instance may become unusable. Examples: resource release due to overdue payment, instance redeployment due to an error.
-
Warning: Moderate impact requiring attention or action at a suitable time. Example: a burstable instance with limited performance can still run but cannot burst above baseline.
-
Information: An event you can optionally monitor. Example: a disk snapshot creation event.
States and windows of system events
The following table lists system event states displayed in the ECS console.
For event states supported by each system event, see the CloudMonitor event name column in Summary of ECS system events.
|
State |
Property |
Description |
|
Inquiring |
Intermediate state |
Inquiry in progress, awaiting authorization. After authorization, the state changes to Executing. |
|
Scheduled |
Intermediate state |
The O&M task is scheduled but not yet started. Changes to Executing when the task starts. |
|
Executing |
Intermediate state |
The O&M task is in progress. |
|
Executed |
Stable state |
The O&M task is complete. |
|
Avoided |
Stable state |
The instance was migrated within the user operation window, avoiding the system event impact. |
|
Failed |
Stable state |
The O&M task failed. |
|
Canceled |
Stable state |
The system canceled the O&M task. |
The following figure shows typical event state transitions.
System events include the following windows:
-
User operation window
The period from when a system event is sent to its scheduled execution time. You can respond during this window or wait for automatic execution. Window duration varies by event:
-
Scheduled O&M events: typically 24 to 48 hours.
NoteEvents in the Inquiring state have no time limit. The O&M task starts only after authorization.
-
Unexpected O&M events caused by sudden failures or non-compliant operations usually have no user operation window.
-
Subscription instance expiration stop: 3 days.
-
Pay-as-you-go instance stop due to overdue payment: less than 1 hour.
-
-
Event execution window
The period from task start to completion. Duration varies by event:
-
Failure repair notifications: usually completed within 10 minutes.
-
Unexpected O&M events caused by sudden failures have only a short execution window.
-
Operations
|
Operation |
Description and references |
|
Understand system events |
Read this topic to understand system event names, severity levels, use cases, states, and naming formats. |
|
View system events |
View system events in the console or with Cloud Assistant CLI:
|
|
Respond to system events |
For critical system events affecting ECS resource availability and performance, respond promptly in the console or with OpenAPI.
|
|
Monitor system events |
Set up event notifications to monitor underlying environment changes and enable automated O&M for your ECS instances.
|
|
Modify system event settings |
Modify system event settings:
|