System events are defined by Alibaba Cloud to record and notify resource information, such as the execution states of O&M tasks, resource exceptions, and resource state changes.
System event categories
|Category||Description||Displayed in the ECS console|
|Unexpected O&M events||This category of system events is triggered when ECS instances restart or break down due to unexpected issues such as kernel panic, out-of-memory errors, or hardware or software failures in underlying hosts. Alibaba Cloud sends these events when they are detected and restores affected ECS resources as soon as possible. At the same time, Alibaba Cloud notifies you of the execution states of system O&M tasks related to the events.||Yes|
|Scheduled O&M events||Alibaba Cloud may need to upgrade host software for security reasons or to foresee and take actions against failure risks that lie in underlying host hardware and software. In these cases, if O&M tasks to be executed by Alibaba Cloud may affect the availability or performance of your ECS resources, Alibaba Cloud triggers and sends scheduled O&M events in advance to notify you of task details such as execution times, objects, and impacts. After you receive a scheduled O&M event, you can handle it during an off-peak period within the event execution window to minimize the impact on your business.||Yes|
|Instance billing events||This category of system events is triggered by upcoming billing changes of instances. For example, instance billing events are triggered when instances expire and are about to be released or when instances are about to be stopped due to overdue payments.||Yes|
|Instance security events||This category of system events is triggered when instances face security threats. For example, instance security events are triggered when instances suffer DDoS attacks or when blackhole filtering is triggered for instances.||Yes|
|State change events||This category of system events is triggered when operations (such as Start and Stop)
on instances cause their lifecycle states to change or when instance attribute changes
cause instance lifecycle or other states to change. State change events are classified
into the following categories:
System event severities
- Critical: Critical system events may result in instance unavailability and must be handled as soon as possible. For example, a critical system event is triggered when resources are released due to an overdue payment or when an instance is redeployed due to an instance error.
- Warning: Warning system events have impact on your business. For example, a warning system event is triggered when a burstable instance cannot burst above its performance baseline. You must pay close attention to these events or handle them when appropriate.
- Notification: Notification system events do not affect your business. For example, a notification system event is triggered when a snapshot is created for a disk. You can optionally pay attention to notification system events.
Use scenarios of system events
- Notification of risks and exceptions
After system events that can be displayed in the ECS console are triggered, Alibaba Cloud pushes the events to the ECS console. These events include those that affect the availability and performance of ECS resources, such as SystemMaintenance.Reboot events among scheduled O&M events and InstanceExpiration.Stop events among instance billing events.For some critical system events, Alibaba Cloud sends additional emails or internal messages. You can handle these events by using the ECS console or by calling API operations. We recommend that you handle the system events as soon as possible to ensure resource availability and performance. For more information, see Query and handle ECS system events.
For example, when a subscription instance is about to expire, the ECS console prompts you to renew the instance within a specified period of time to ensure service continuity.
- Automated O&M
States are defined for system events displayed in the ECS console to help you understand the execution states of system O&M tasks. Meanwhile, new system events and changes in system event states are reported to CloudMonitor so that you can build an event-driven automated O&M system based on your business requirements. For more information about event states, see the States and windows of system events section in this topic.Note Each event state corresponds to a CloudMonitor event. For example, the Executing and Executed states that the InstanceFailure.Reboot ECS event type supports correspond to the Instance:InstanceFailure.Reboot:Executing and Instance:InstanceFailure.Reboot:Executed CloudMonitor events.
Some state change events are not displayed in the ECS console and cannot be handled by using the ECS console or by calling API operations. Examples: events that indicate instance state changes or interruptions of preemptible instances. No states are defined for these system events. However, these events are still reported to CloudMonitor when they are triggered so that you can build an event-triggered automated O&M system based on your business requirements.
For example, state change events are triggered when you manually start or stop instances. These events do not indicate risks or exceptions. If you want to log your operations to your system, you can configure event notifications for state change events and use the alert callback feature to write the startup and stop information of instances to operation logs.
Retired instance families do not support the system event feature. For more information, see Retired instance types.
Operations that can be performed on system events
|Operation||Description and references|
|Understand system events||To learn about system events and understand their categories, severities, use scenarios, limits, states, and name formats, see this topic.|
|View system events||You can view system events by using the ECS console, CloudMonitor console, or Alibaba Cloud CLI.|
|Handle system events||For some high-risk system events (such as system events that affect the availability
and performance of ECS resources), we recommend that you handle the events as suggested
by using the ECS console or by calling API operations as soon as possible to ensure
|Monitor system events||To ensure the stability of services that run on ECS instances and automate O&M, we
recommend that you configure event notifications to be notified of underlying environment
changes. After event notifications are configured, the system uses your specified
notification methods to send you notifications.
|Modify system event-related settings||You can modify system event-related settings based on your business requirements.
States and windows of system events
|Inquiring||Intermediate||The O&M task related to the system event is pending authorization. After you authorize the task to be executed, the event enters the Executing state.|
|Scheduled||Intermediate||The O&M task related to the system event is scheduled and pending execution. When the O&M task is executed, the event enters the Executing state.|
|Executing||Intermediate||The O&M task related to the system event is being executed.|
|Executed||Stable||The O&M task related to the system event is completed.|
|Avoided||Stable||The impacts of the system event are prevented because you have migrated the affected instance within the user operation window.|
|Failed||Stable||The O&M task related to the system event failed.|
|Canceled||Stable||The O&M task related to the system event is automatically canceled.|
- User operation window
The user operation window of a system event starts when the event is sent and ends when the O&M task related to the event is executed as scheduled. You can manually handle the event within the user operation window or wait for the system to automatically handle O&M task. Take note of the following items about the lengths of user operation windows:
- In most cases, the user operation window of a scheduled O&M event ranges from 24 to
Note The lengths of user operation windows are unlimited for system events in the Inquiring state. The O&M tasks related to the events can start only after you authorize the tasks to be executed.
- Typically, unexpected O&M system events caused by failures or invalid operations do not have a user operation window.
- For system events indicating that subscription instances are about to expire, the window is three days.
- For system events indicating that pay-as-you-go instances are to be stopped due to overdue payments, the window is less than 1 hour.
- In most cases, the user operation window of a scheduled O&M event ranges from 24 to 48 hours.
- Event execution window
The execution window of a system event starts when the O&M task related to the event is executed and ends when the task is completed. Take note of the following items about the lengths of event execution windows:
- For system events such as failure recovery events, the window is within 10 minutes.
- Unexpected O&M events caused by failures or invalid operations have a short event execution window.
Formats of ECS event type and CloudMonitor event names
- ECS event types are named in the
<Event cause>.<Event impact>format to indicate event causes and impacts on resources.
- CloudMonitor events are named in the
<Resource type>:<Event cause>.<Event impact>:<Event state>format to indicate resource types, event causes, event impacts on resources, and event states.
Disk:ErrorDetected:Executingindicates that a disk is damaged, and excludes information about impacts on resources.
|Category||Example ECS event type||Example CloudMonitor event||Description|
|Scheduled O&M events||SystemMaintenance.Reboot||Instance:SystemMaintenance.Reboot:Inquiring||
|Unexpected O&M events||ErrorDetected||Disk:ErrorDetected:Executing||
|Lifecycle state change events||Undefined||Instance:StateChange||
|Instance billing events||InstanceExpiration.Stop||Instance:InstanceExpiration.Stop:Scheduled||