What are the ECS system events - Elastic Compute Service

ECS system events notify you of O&M task status, resource exceptions, and status changes, enabling prompt responses and automated O&M workflows.

Note

This topic covers only ECS system events. For system events of other Alibaba Cloud products, see their respective documentation.

Use cases

Risk and exception notifications

Alibaba Cloud pushes system events to the ECS console to notify you of situations that affect resource availability and performance, such as system maintenance restarts and instance expirations. For critical events, Alibaba Cloud also sends email, and internal message notifications. Respond promptly in the ECS console or with OpenAPI to avoid business interruptions. See Query and respond to ECS system events.

For example, when a subscription instance is about to expire, the ECS console prompts you to renew it before it stops.
Automated O&M
System events in the ECS console have defined states that track O&M task execution. State changes sync with CloudMonitor, enabling automated O&M workflows. See States and windows of system events.
Note
- Each event state maps to a CloudMonitor event name. For example, InstanceFailure.Reboot supports the Executing and Executed states, corresponding to Instance:InstanceFailure.Reboot:Executing and Instance:InstanceFailure.Reboot:Executed.
Some status change events, such as instance running status changes and spot instance interruptions, are not displayed in the ECS console and have no defined event states. However, they are still reported to CloudMonitor, enabling event-driven automated O&M.

For example, manually starting or stopping an instance generates a status change event. This is not a risk or exception, but you can set up event notifications to record such operations in your log system through callbacks.

System event types

System events are classified by trigger:

Note

For supported event types and recommended actions, see Summary of ECS system events.

Category	Description	Displayed in the ECS console
Scheduled O&M events	Alibaba Cloud upgrades host software and proactively mitigates hardware and software failure risks. If an O&M task may affect your ECS resource availability or performance, Alibaba Cloud notifies you in advance with the execution time, affected resources, and impact. Respond during off-peak hours within the scheduled window to minimize business disruption. Note Scheduled O&M events (also called proactive O&M events) use Alibaba Cloud's large-scale server management experience and Alibaba DAMO Academy machine learning algorithms to predict and avoid host failure risks. When a risk cannot be avoided, Alibaba Cloud notifies you in advance to allow service switchover. If you do not respond, your instance may go down or restart when the failure occurs.	Yes Note For big data instance families or instance families with local SSDs (excluding i4p), scheduled O&M events appear under Local Disk Instance Events. See O&M scenarios and system events for instances with local disks.
Unexpected O&M events	When an underlying host experiences a sudden hardware or software failure, or an instance encounters an issue such as an OOM error or kernel panic, the instance may restart or go down unexpectedly. Alibaba Cloud sends an unexpected O&M event, restores resource availability, and notifies you of the O&M task status. Note Unexpected O&M events typically involve sudden instance downtime or restarts caused by unpredictable host failures or OS kernel errors. Host-failure-caused downtime or restart events (SystemFailure.Reboot) are occasional and inevitable. If the single-instance SLA is violated, Alibaba Cloud compensates per the SLA. Restart events (InstanceFailure.Reboot) caused by OS kernel errors are typically application-related. Capture a dump file to analyze the root cause. See How to enable the kdump service on a Linux instance.	Yes Note For big data instance families or instance families with local SSDs (excluding i4p), unexpected O&M events appear under Local Disk Instance Events. See O&M scenarios and system events for instances with local disks.
Local disk instance events	Events for local disks (such as disk corruption) and for instances with local disks (such as instances affected by local disk damage or underlying host failures). Note Local Disk Instance Events is not a specific event type. It groups scheduled and unexpected O&M events for big data instance families or instance families with local SSDs (excluding i4p). See O&M scenarios and system events for instances with local disks.	Yes
Performance limited events for burstable instances	Reminder events indicating a burstable instance has exhausted its CPU credits. The CPU runs at baseline performance, which may cause slow access and degraded application performance.	Yes
Instance security events	Events that affect instance security, such as a DDoS attack or blackhole that threatens the instance.	Yes
Instance migration events due to underlying layer upgrades	When Alibaba Cloud upgrades physical infrastructure, instances in the affected regions and zones may need migration. Follow the system event guidance to migrate your instances.	Yes
Status change events	Events triggered by instance lifecycle operations (such as manual start/stop) or property changes detected by Alibaba Cloud: Lifecycle change events: Changes in instance running status, spot instance interruption, or snapshot creation completion. Other property change events: Switching the performance mode of a burstable instance, or completing disk conversion to pay-as-you-go.	Lifecycle change events: Not displayed in the ECS console. Other property change events: Whether they are displayed in the ECS console depends on the specific event.

System event severities

System events have the following severity levels:

Critical: Major impact requiring immediate action. Without action, the instance may become unusable. Examples: resource release due to overdue payment, instance redeployment due to an error.
Warning: Moderate impact requiring attention or action at a suitable time. Example: a burstable instance with limited performance can still run but cannot burst above baseline.
Information: An event you can optionally monitor. Example: a disk snapshot creation event.

States and windows of system events

The following table lists system event states displayed in the ECS console.

Note

For event states supported by each system event, see the CloudMonitor event name column in Summary of ECS system events.

State	Property	Description
Inquiring	Intermediate state	Inquiry in progress, awaiting authorization. After authorization, the state changes to Executing.
Scheduled	Intermediate state	The O&M task is scheduled but not yet started. Changes to Executing when the task starts.
Executing	Intermediate state	The O&M task is in progress.
Executed	Stable state	The O&M task is complete.
Avoided	Stable state	The instance was migrated within the user operation window, avoiding the system event impact.
Failed	Stable state	The O&M task failed.
Canceled	Stable state	The system canceled the O&M task.

The following figure shows typical event state transitions.

System events include the following windows:

User operation window
The period from when a system event is sent to its scheduled execution time. You can respond during this window or wait for automatic execution. Window duration varies by event:
- Scheduled O&M events: typically 24 to 48 hours.
  
  Note
  Events in the Inquiring state have no time limit. The O&M task starts only after authorization.
- Unexpected O&M events caused by sudden failures or non-compliant operations usually have no user operation window.
- Subscription instance expiration stop: 3 days.
- Pay-as-you-go instance stop due to overdue payment: less than 1 hour.
Event execution window
The period from task start to completion. Duration varies by event:
- Failure repair notifications: usually completed within 10 minutes.
- Unexpected O&M events caused by sudden failures have only a short execution window.

Operations

Operation	Description and references
Understand system events	Read this topic to understand system event names, severity levels, use cases, states, and naming formats.
View system events	View system events in the console or with Cloud Assistant CLI: View events in the ECS console or with Cloud Assistant CLI: Query and respond to ECS system events. View events in the CloudMonitor console: Query system events.
Respond to system events	For critical system events affecting ECS resource availability and performance, respond promptly in the console or with OpenAPI. Recommended actions for all events: Summary of ECS system events. Handle pending events: Query and respond to ECS system events. Handle local disk events: O&M scenarios and system events for instances with local disks.
Monitor system events	Set up event notifications to monitor underlying environment changes and enable automated O&M for your ECS instances. Configure CloudMonitor alert rules: Subscribe to ECS system event notifications. Use a DingTalk robot: Send event notifications using a DingTalk robot.
Modify system event settings	Modify system event settings: Configure restart or redeployment behavior after responding to a system event: Modify instance maintenance properties. Set the response time and restart time for events with a restart plan: Modify a scheduled restart time.