System events are used to record and notify cloud resource information, such as O&M task executions, resource exceptions, and resource status changes. You can use system events to obtain information about risks and anomalies of Elastic Compute Service (ECS) resources. For example, a system event is generated when an instance must be migrated due to underlying upgrades or when an instance is restarted due to system maintenance. You can respond to and handle the system event at the earliest opportunity to prevent your business from being affected by ECS resource unavailability or performance degradation. This topic provides a summary of the system events supported by ECS, including scheduled O&M events, unexpected O&M events, instance billing events, and instance status change events. This topic also provides suggestions on how to handle the system events.
Formats of ECS event codes and CloudMonitor event names
ECS system events are synchronized to CloudMonitor. This allows you to set up an automated O&M mechanism based on system events. ECS event codes and CloudMonitor events follow specific naming conventions.
ECS event codes indicate the event causes and impacts on resources and are in the
<Event cause>.<Event impact>
format.CloudMonitor event names indicate the resource types, event causes, event impacts on resources, and event status and are in the
<Resource type>:<Event cause>.<Event impact>:<Event status>
format.
ECS event codes and CloudMonitor event names may include only some of the preceding information. For example, a CloudMonitor event name of Disk:ErrorDetected:Executing
indicates that a disk is damaged and excludes information about the impacts on resources.
The following table describes some examples of ECS event codes and CloudMonitor event names.
An ECS event code of Undefined indicates that ECS events are not displayed in the ECS console and cannot be handled in the ECS console or by calling API operations.
Category | Sample ECS event code | Sample CloudMonitor event name | Description |
Scheduled O&M events | SystemMaintenance.Reboot | Instance:SystemMaintenance.Reboot:Inquiring |
|
Unexpected O&M events | ErrorDetected | Disk:ErrorDetected:Executing |
|
Lifecycle status change events | Snapshot:CreateSnapshotCompleted | Snapshot:CreateSnapshotCompleted |
|
Scheduled O&M events
If you perform a restart operation within the operating system of an instance on which a system event occurred, the maintenance action corresponding to the event cannot take effect. All instance restart operations in this topic are performed in the ECS console or by calling an API operation. For more information, see Restart an instance or RebootInstance.
Event code | Event name | Event severity level | CloudMonitor event name | Event description and impact | Handling suggestion |
SystemMaintenance.Reboot | Instance Restart Due to System Maintenance | Critical |
| This system event is triggered 24 to 48 hours before the scheduled time of system maintenance when Alibaba Cloud detects a potential risk of hardware or software failure on the underlying host of an instance and the risk can cause instance restarts. Note Take note of the following risks:
| We recommend that you perform one of the following actions to handle the event:
Note
|
SystemMaintenance.Stop | Instance Stopped Due to System Maintenance | Critical |
| This system event is triggered 24 to 48 hours before the scheduled time of system maintenance when Alibaba Cloud detects a potential risk of hardware or software failure on the underlying host of an instance and the risk can cause instance stops. | We recommend that you perform one of the following actions to handle the event:
Note You can modify the maintenance attributes of the instance to specify the default action to take when an O&M event occurs on the instance. For more information, see Modify instance maintenance attributes. |
SystemMaintenance.Redeploy | Instance Redeployment Due to System Maintenance | Critical |
| This system event is triggered 24 to 48 hours before the scheduled time of system maintenance when Alibaba Cloud detects a potential risk of hardware or software failure on the underlying host of an instance and the risk can cause instance redeployment. Important If the instance is equipped with local SSDs or local HDDs, the data disks on the instance are re-initialized and the data stored on the local disks is cleared. | We recommend that you make preparations, such as modifying the /etc/fstab configuration file and backing up data, and then perform one of the following actions to handle the event:
Note
|
SystemMaintenance.IsolateErrorDisk | Damaged Disk Isolation Due to System Maintenance | Critical |
| This system event is immediately triggered when Alibaba Cloud detects hardware or software damage on a local disk of an instance. Important The procedure for handling a damaged local disk of an instance varies based on the instance type. For specific instance types, the instance must be restarted and the damaged local disk must be isolated. For other instance types, the damaged local disk can be isolated online and then repaired. | We recommend that you make preparations, such as modifying the /etc/fstab configuration file and backing up data, and then select an appropriate point in time to authorize the damaged disk to be isolated. Then, the local disk is isolated online without the need to restart the associated instance. Note For more information, see the Scenario ③ section of the "O&M scenarios and system events for instances equipped with local disks" topic. |
SystemMaintenance.ReInitErrorDisk | Damaged Disk Re-initialization Due to System Maintenance | Critical |
| This system event is immediately triggered when Alibaba Cloud isolates and replaces a local disk on the host of an instance after Alibaba Cloud detects hardware or software damage on the local disk. In most cases, Alibaba Cloud isolates and replaces a damaged local disk within five business days after you authorize Alibaba Cloud to isolate the local disk. Important The procedure for handling a damaged local disk of an instance varies based on the instance type. For specific instance types, the instance must be restarted and the damaged local disk must be isolated. For other instance types, the damaged local disk can be isolated online and then repaired. | We recommend that you select an appropriate point in time to authorize the local disk to be repaired. Then, the local disk is repaired online without the need to restart the associated instance. Note For more information, see the Scenario ③ section of the "O&M scenarios and system events for instances equipped with local disks" topic. |
SystemMaintenance.RebootAndIsolateErrorDisk | Damaged Disk Isolation and Instance Restart Due to System Maintenance | Critical |
| This system event is immediately triggered when Alibaba Cloud detects hardware or software damage on a local disk of an instance and fails to isolate the local disk online. Important The procedure for handling a damaged local disk of an instance varies based on the instance type. For specific instance types, the instance must be restarted and the damaged local disk must be isolated. For other instance types, the damaged local disk can be isolated online and then repaired. | We recommend that you select an appropriate point in time to authorize the damaged disk to be isolated and restart the associated instance after the disk is isolated. In this case, the local disk is isolated offline. You must restart the associated instance for the isolation operation to take effect. Note For more information, see the Scenario ③ section of the "O&M scenarios and system events for instances equipped with local disks" topic. |
SystemMaintenance.RebootAndReInitErrorDisk | Damaged Disk Re-initialization and Instance Restart Due to System Maintenance | Critical |
| This system event is immediately triggered when Alibaba Cloud detects hardware or software damage on a local disk of an instance and fails to repair the local disk online. Important The procedure for handling a damaged local disk of an instance varies based on the instance type. For specific instance types, the instance must be restarted and the damaged local disk must be isolated. For other instance types, the damaged local disk can be isolated online and then repaired. | We recommend that you select an appropriate point in time to authorize the local disk to be repaired and restart the associated instance after the disk is repaired. In this case, the local disk is repaired offline. You must restart the associated instance for the restoration operation to take effect. Note For more information, see the Scenario ③ section of the "O&M scenarios and system events for instances equipped with local disks" topic. |
SystemMaintenance.StopAndRepair | In-place Repair of Instance Equipped with Local Disks | Critical |
| This system event is triggered 48 to 168 hours before the scheduled time of system maintenance when Alibaba Cloud detects a risk of hardware failure on the underlying host of an instance. | We recommend that you select an appropriate point in time to authorize Alibaba Cloud to repair or redeploy the instance that is equipped with local disks. Note For more information, see O&M scenarios and system events for instances equipped with local disks. |
SystemMaintenance.CleanReleasedDisks | Disk Cleanup After EBS Disk Hot Swapping Failure | Warning |
| This system event is triggered when Alibaba Cloud detects the configurations of one or more cloud disks that were released due to overdue payments in the operating system of an instance. | We recommend that you select an appropriate point in time to authorize Alibaba Cloud to clear the configurations of the released cloud disks. Important Alibaba Cloud stops the instance at the specified point in time and then clears the configurations of the cloud disks. After the cloud disk configurations are cleared, the instance is restarted. |
Unexpected O&M events
Event code | Event name | Event severity level | CloudMonitor event name | Event description and impact | Handling suggestion |
SystemFailure.Reboot | Instance Restart Due to System Error | Critical |
| This system event is immediately triggered when Alibaba Cloud detects that an instance is restarted due to hardware or software failure on the underlying host, such as CPU or memory hardware damage. | We recommend that you wait until the instance is automatically restarted and then check whether the instance and applications work as expected. When the instance is being restarted, Alibaba Cloud migrates the instance to a healthy host. Note You can modify the maintenance attributes of the instance to specify the default action to take when an O&M event occurs on the instance. For more information, see Modify instance maintenance attributes. |
InstanceFailure.Reboot | Instance Restart Due to OS Error | Critical |
| This system event is immediately triggered when Alibaba Cloud detects that an instance operating system is down due to issues, such as out-of-memory (OOM), blue screen, freeze, continuous printing of serial port logs, and kernel panic. | We recommend that you wait until the instance is automatically restarted and then check whether the instance and applications work as expected. You can enable the kdump service of the operating system to troubleshoot the issue and prevent the issue from recurring. For more information, see How to enable the Kdump service for Linux instances and Enable the Kernel Memory Dump feature for a Windows instance. |
SystemFailure.Stop | Instance Stop Due to System Error | Critical |
| This system event is immediately triggered when Alibaba Cloud detects that an instance is stopped due to hardware or software failure on the underlying host, such as CPU or memory hardware damage. | We recommend that you wait until the instance is automatically restarted and then start the instance. When the instance is being started, Alibaba Cloud migrates the instance to a healthy host. Note You can modify the maintenance attributes of the instance to specify the default action to take when an O&M event occurs on the instance. For more information, see Modify instance maintenance attributes. |
SystemFailure.Redeploy | Instance Redeployment Due to System Error | Critical |
| This system event is immediately triggered when Alibaba Cloud detects hardware or software failure on the underlying host of an instance equipped with local disks and the instance must be redeployed. Note Only instances that depend on host hardware support this event, such as instances that are equipped with local disks or support Software Guard Extensions (SGX) confidential computing. | We recommend that you make preparations, such as modifying the /etc/fstab configuration file and backing up data, and then perform one of the following actions to handle the event:
Note You can modify the maintenance attributes of the instance to specify the default action to take when an O&M event occurs on the instance. For more information, see Modify instance maintenance attributes. |
SystemFailure.Delete | Automatic Cancellation of Bills Due to Instance Creation Failures | Critical |
| This system event is immediately triggered when Alibaba Cloud detects that an instance creation order is placed but the instance fails to be created. | We recommend that you wait for the instance to be automatically released. In most cases, an instance is automatically released within 5 minutes after the instance fails to be created. Note If you already paid for the order, the payment is refunded after the instance is released. To ensure that instances can be created, we recommend that you perform the following actions:
|
ErrorDetected | Local Disk Damage | Critical |
| This system event is immediately triggered when Alibaba Cloud detects hardware or software failure on the local disk of an instance and data cannot be read from the disk or written to the disk. | We recommend that you make preparations, such as modifying the /etc/fstab configuration file and backing up data. Then, select a point in time to isolate and repair the damaged local disk. The supported operations vary based on the instance type.
Note For more information, see the Scenario ③ section of the "O&M scenarios and system events for instances equipped with local disks" topic. |
Stalled | Significant Block Storage Performance Impact | Critical |
| This system event is immediately triggered when Alibaba Cloud detects that an I/O hang occurs on a cloud disk of the instance. This significantly affects the disk performance and prevents the disk from processing read and write requests. | We recommend that you isolate reads and writes on the cloud disk at the application layer or disassociate the ECS instance from the associated Server Load Balancer (SLB) instance. |
Instance migration events due to upgrades at the underlying layer
Event code | Event name | Event severity level | CloudMonitor event name | Event description and impact | Handling suggestion |
SystemUpgrade.Migrate | Instance Migration Due to Upgrades at Underlying Layer | Critical | Undefined | This system event is triggered when instances are affected by the upgrades and improvements of physical infrastructure in regions and zones where the instances reside. | We recommend that you view event details in the ECS console and migrate affected instances as prompted. For more information, see Instance migration due to upgrades at the underlying layer. |
Burstable instance performance degradation events
Event code | Event name | Event severity level | CloudMonitor event name | Event description and impact | Handling suggestion |
Instance:BurstablePerformanceRestricted | Burstable Instance Performance Degradation | Warning | Instance:BurstablePerformanceRestricted | This system event is triggered when all accrued CPU credits of a burstable instance are consumed. | We recommend that you perform one of the following actions to handle the event:
If you want to specify thresholds for triggering notifications about this event, such as when you want an event notification to be sent when accrued CPU credits remain less than 10 for 10 consecutive minutes, you can configure event-triggered alert rules for the event in the CloudMonitor console. For more information, see Monitor burstable instances. |
Status change events
Event code | Event name | Event severity level | CloudMonitor event name | Event description and impact | Handling suggestion |
Instance:PreemptibleInstanceInterruption | Preemptible Instance Interruption | Warning | Instance:PreemptibleInstanceInterruption | This system event is triggered 5 minutes before a preemptible instance is reclaimed. | We recommend that you take one of the following actions:
|
Instance:ModifyInstanceSpec.Reboot | Instance Restart Due to Instance Type Change | Critical |
| After the instance type of an instance is changed, restart the instance for the new instance type to take effect. If you do not restart the instance within seven days after the new order takes effect, the system forcefully restarts the instance for the new instance type to take effect. | We recommend that you take one of the following actions:
|
Instance:PerformanceModeChange | Performance Mode Switchover of Burstable Instance | Warning | Instance:PerformanceModeChange | This system event is triggered when a burstable instance switches between the unlimited mode and the standard mode. | We recommend that you determine whether to monitor the event. If you want to monitor the event, you can configure notifications for the event in the CloudMonitor console. For more information, see Subscribe to ECS system event notifications. |
Instance:StateChange | Instance Status Change | Notification | Instance:StateChange | This system event is triggered when the status of an instance changes, such as from Running to Stopping or from Stopping to Stopped. | We recommend that you determine whether to monitor the event. If you want to monitor the event, you can configure notifications for the event in the CloudMonitor console. For more information, see Subscribe to ECS system event notifications. |
Instance:AutoReactivateCompleted | Automatic Reactivation Completed | Notification | Instance:AutoReactivateCompleted | This system event is triggered when you complete overdue payments in your account and an instance is automatically reactivated. | We recommend that you determine whether to monitor the event. If you want to monitor the event, you can configure notifications for the event in the CloudMonitor console. For more information, see Subscribe to ECS system event notifications. |
Instance:LiveMigrationAcrossDDH | Instance Hot Migration Between Dedicated Hosts | Notification | Instance:LiveMigrationAcrossDDH | This system event is triggered when an instance is hot migrated between dedicated hosts. | We recommend that you determine whether to monitor the event. If you want to monitor the event, you can configure notifications for the event in the CloudMonitor console. For more information, see Subscribe to ECS system event notifications. |
Disk:DiskOperationCompleted | Disk Operations Completed | Notification | Disk:DiskOperationCompleted | This system event is triggered when a pay-as-you-go disk is manually attached or detached. | We recommend that you determine whether to monitor the event. If you want to monitor the event, you can configure notifications for the event in the CloudMonitor console. For more information, see Subscribe to ECS system event notifications. |
Disk:ConvertToPostpaidCompleted | Billing Method of Disks Switched to Pay-as-you-go | Notification | Disk:ConvertToPostpaidCompleted | This system event is triggered when a subscription disk is changed to a pay-as-you-go disk. | We recommend that you determine whether to monitor the event. If you want to monitor the event, you can configure notifications for the event in the CloudMonitor console. For more information, see Subscribe to ECS system event notifications. |
Snapshot:CreateSnapshotCompleted | Disk Snapshot Created | Notification | Snapshot:CreateSnapshotCompleted | This system event is triggered when a snapshot is created for a disk. | We recommend that you determine whether to monitor the event. If you want to monitor the event, you can configure notifications for the event in the CloudMonitor console. For more information, see Subscribe to ECS system event notifications. |
Snapshot:SnapshotDeleted | Snapshot Deletion Completed | Notification | Snapshot:SnapshotDeleted | This system event is generated when a manual or automatic snapshot is deleted. | None. |