All Products
Search
Document Center

Elastic Compute Service:Summary

Last Updated:Sep 13, 2023

This topic summarizes the system events that are supported by Elastic Compute Service (ECS) and provides suggestions on how to handle the events.

Note

If Undefined is displayed in the Event code column of a system event, the system event is not displayed in the ECS console and cannot be queried by calling API operations. Example: Undefined is displayed in the Event code column of the Instance:StateChange event.

Scheduled O&M events

Event code

Event name

Event level

CloudMonitor event

Event description and impact

Handling suggestion

SystemMaintenance.Reboot

Instance Restart Due to System Maintenance

Critical

  • Instance:SystemMaintenance.Reboot:Inquiring: Querying Instance Restart Due to System Maintenance

  • Instance:SystemMaintenance.Reboot:Scheduled: Instance Restart Scheduled Due to System Maintenance

  • Instance:SystemMaintenance.Reboot:Executing: Restarting Instance Due to System Maintenance

  • Instance:SystemMaintenance.Reboot:Executed: Instance Restarted Due to System Maintenance

  • Instance:SystemMaintenance.Reboot:Avoided: Instance Restart Avoided Due to System Maintenance

  • Instance:SystemMaintenance.Reboot:Failed: Instance Restart Failed Due to System Maintenance

  • Instance:SystemMaintenance.Reboot:Canceled: Instance Restart Canceled Due to System Maintenance

This system event is triggered 24 to 48 hours before the scheduled time of system maintenance when Alibaba Cloud detects a potential risk of hardware or software failure in the underlying host of an instance and the risk causes instance restarts.

We recommend that you take one of the following actions in response to the event:

Note
  • We recommend that you pay attention to the status of the event. If the event status remains unchanged after the instance is restarted, the event is not handled and the risk is not mitigated. To mitigate the risk, we recommend that you restart the instance at least 12 hours from the time of the current operation.

  • You can modify the maintenance attributes of the instance to specify the default action that takes effect when the instance encounters a maintenance event. For more information, see Modify instance maintenance attributes.

SystemMaintenance.Stop

Instance Stopped Due to System Maintenance

Critical

  • Instance:SystemMaintenance.Stop:Scheduled: Instance Stop Scheduled Due to System Maintenance

  • Instance:SystemMaintenance.Stop:Executing: Stopping Instance Due to System Maintenance

  • Instance:SystemMaintenance.Stop:Executed: Instance Stopped Due to System Maintenance

  • Instance:SystemMaintenance.Stop:Avoided: Instance Stop Avoided Due to System Maintenance

  • Instance:SystemMaintenance.Stop:Failed: Instance Stop Failed Due to System Maintenance

  • Instance:SystemMaintenance.Stop:Canceled: Instance Stop Canceled Due to System Maintenance

This system event is triggered 24 to 48 hours before the scheduled time of system maintenance when Alibaba Cloud detects a potential risk of hardware or software failure in the underlying host of an instance and the risk causes the instance to stop.

We recommend that you take one of the following actions in response to the event:

  • Redeploy the instance. For more information, see Redeploy an instance equipped with local disks.

  • Wait for the instance to be automatically stopped and then perform instance operations such as redeployment based on your business requirements.

Note

You can modify the maintenance attributes of the instance to specify the default action that takes effect when the instance encounters a maintenance event. For more information, see Modify instance maintenance attributes.

SystemMaintenance.Redeploy

Instance Redeployment Due to System Maintenance

Critical

  • Instance:SystemMaintenance.Redeploy:Inquiring: Querying Instance Redeployment Due to System Maintenance

  • Instance:SystemMaintenance.Redeploy:Scheduled: Instance Redeployment Scheduled Due to System Maintenance

  • Instance:SystemMaintenance.Redeploy:Executing: Redeploying Instance Due to System Maintenance

  • Instance:SystemMaintenance.Redeploy:Executed: Instance Redeployed Due to System Maintenance

  • Instance:SystemMaintenance.Redeploy:Avoided: Instance Redeployment Avoided Due to System Maintenance

  • Instance:SystemMaintenance.Redeploy:Canceled: Instance Redeployment Canceled Due to System Maintenance

This system event is triggered 24 to 48 hours before the scheduled time of system maintenance when Alibaba Cloud detects a potential risk of hardware or software failure in the underlying host of an instance and the risk causes instance redeployment.

Important

If the instance is equipped with local SSDs or local HDDs, the data disks are re-initialized and the data is cleared.

We recommend that you make preparations such as modifying the /etc/fstab configuration file and backing up data, and then take one of the following actions in response to the event:

Note
  • We recommend that you pay attention to the status of the event. If the event status remains unchanged after the instance is redeployed, the event handling fails and the risk is not mitigated. To mitigate the risk, we recommend that you select a point in time that is at least 12 hours apart from the time of the current operation to redeploy the instance.

  • You can modify the maintenance attributes of the instance to specify the default action that takes effect when the instance encounters a maintenance event. For more information, see Modify instance maintenance attributes.

SystemMaintenance.IsolateErrorDisk

Isolation of Damaged Local Disks Due to System Maintenance

Critical

  • Instance:SystemMaintenance.IsolateErrorDisk:Inquiring: Querying Damaged Disk Isolation Due to System Maintenance

  • Instance:SystemMaintenance.IsolateErrorDisk:Executing: Isolating Damaged Disk Due to System Maintenance

  • Instance:SystemMaintenance.IsolateErrorDisk:Executed: Damaged Disk Isolated Due to System Maintenance

  • Instance:SystemMaintenance.IsolateErrorDisk:Avoided: Damaged Disk Isolation Avoided Due to System Maintenance

  • Instance:SystemMaintenance.IsolateErrorDisk:Failed: Damaged Disk Isolation Failed Due to System Maintenance

  • Instance:SystemMaintenance.IsolateErrorDisk:Canceled: Damaged Disk Isolation Canceled Due to System Maintenance

This system event is immediately triggered when Alibaba Cloud detects hardware damage or software damage in a local disk of an instance.

Important

The procedure for handling a damaged local disk of an instance varies based on the instance type. For specific instance types, the instance must be restarted and the damaged local disk must be isolated. For other instance types, the damaged local disk can be isolated online, and then repaired.

We recommend that you make preparations such as modifying the /etc/fstab configuration file and backing up data, and then select an appropriate point in time to authorize the damaged disk to be isolated. Then, the local disk is isolated online without the need to restart its associated instance.

Note

For more information, see the "Scenario ③" section in O&M scenarios and system events for instances equipped with local disks.

SystemMaintenance.ReInitErrorDisk

Re-initialization of Damaged Local Disks Due to System Maintenance

Critical

  • Instance:SystemMaintenance.ReInitErrorDisk:Inquiring: Querying Damaged Disk Re-initialization Due to System Maintenance

  • Instance:SystemMaintenance.ReInitErrorDisk:Executing: Re-initializing Damaged Disk Due to System Maintenance

  • Instance:SystemMaintenance.ReInitErrorDisk:Executed: Damaged Disk Re-initialized Due to System Maintenance

  • Instance:SystemMaintenance.ReInitErrorDisk:Avoided: Damaged Disk Re-initialization Avoided Due to System Maintenance

  • Instance:SystemMaintenance.ReInitErrorDisk:Failed: Damaged Disk Re-initialization Failed Due to System Maintenance

  • Instance:SystemMaintenance.ReInitErrorDisk:Cancele: Damaged Disk Re-initialization Canceled Due to System Maintenance

This system event is immediately triggered when Alibaba Cloud isolates and replaces a local disk on the host of an instance after Alibaba Cloud detects hardware damage or software damage in the local disk. In most cases, Alibaba Cloud isolates and replaces a damaged local disk within five business days after you authorize Alibaba Cloud to isolate the local disk.

Important

The procedure for handling a damaged local disk of an instance varies based on the instance type. For specific instance types, the instance must be restarted and the damaged local disk must be isolated. For other instance types, the damaged local disk can be isolated online, and then repaired.

We recommend that you select an appropriate point in time to authorize the local disk to be restored. Then, the local disk is restored online without the need to restart its associated instance.

Note

For more information, see the "Scenario ③" section in O&M scenarios and system events for instances equipped with local disks.

SystemMaintenance.RebootAndIsolateErrorDisk

Isolation of Damaged Local Disks and Instance Restart Due to System Maintenance

Critical

  • Instance:SystemMaintenance.RebootAndIsolateErrorDisk:Inquiring: Querying Instance Restart and Damaged Disk Isolation Due to System Maintenance

  • Instance:SystemMaintenance.RebootAndIsolateErrorDisk:Executing: Restarting Instance and Isolating Damaged Disk Due to System Maintenance

  • Instance:SystemMaintenance.RebootAndIsolateErrorDisk:Executed: Instance Restarted and Damaged Disk Isolated Due to System Maintenance

  • Instance:SystemMaintenance.RebootAndIsolateErrorDisk:Avoided: Instance Restart and Damaged Disk Isolation Avoided Due to System Maintenance

  • Instance:SystemMaintenance.RebootAndIsolateErrorDisk:Canceled: Instance Restart and Damaged Disk Isolation Canceled Due to System Maintenance

This system event is immediately triggered when Alibaba Cloud detects hardware damage or software damage in a local disk of an instance and fails to isolate the local disk online.

Important

The procedure for handling a damaged local disk of an instance varies based on the instance type. For specific instance types, the instance must be restarted and the damaged local disk must be isolated. For other instance types, the damaged local disk can be isolated online, and then repaired.

We recommend that you select an appropriate point in time to authorize the damaged disk to be isolated and restart the associated instance after the disk is isolated. In this case, the local disk is isolated offline, so you must restart its associated instance for the isolation operation to take effect.

Note

For more information, see the "Scenario ③" section in O&M scenarios and system events for instances equipped with local disks.

SystemMaintenance.RebootAndReInitErrorDisk

Re-initialization of Damaged Local Disks and Instance Restart Due to System Maintenance

Critical

  • Instance:SystemMaintenance.RebootAndReInitErrorDisk:Inquiring: Querying Instance Restart and Damaged Disk Re-initialization Due to System Maintenance

  • Instance:SystemMaintenance.RebootAndReInitErrorDisk:Executing: Restarting Instance and Re-initializing Damaged Disk Due to System Maintenance

  • Instance:SystemMaintenance.RebootAndReInitErrorDisk:Executed: Instance Restarted and Damaged Disk Re-initialized Due to System Maintenance

  • Instance:SystemMaintenance.RebootAndReInitErrorDisk:Avoided: Instance Restart and Damaged Disk Re-initialization Avoided Due to System Maintenance

  • Instance:SystemMaintenance.RebootAndReInitErrorDisk:Canceled: Instance Restart and Damaged Disk Re-initialization Canceled Due to System Maintenance

This system event is immediately triggered when Alibaba Cloud detects hardware damage or software damage in a local disk of an instance and fails to restore the local disk online.

Important

The procedure for handling a damaged local disk of an instance varies based on the instance type. For specific instance types, the instance must be restarted and the damaged local disk must be isolated. For other instance types, the damaged local disk can be isolated online, and then repaired.

We recommend that you select an appropriate point in time to authorize the local disk to be restored and restart the associated instance after the disk is restored. In this case, the local disk is restored offline, so you must restart its associated instance for the restoration operation to take effect.

Note

For more information, see the "Scenario ③" section in O&M scenarios and system events for instances equipped with local disks.

SystemMaintenance.StopAndRepair

In-place Repair of Instance Equipped With Local Disks

Critical

  • Instance:SystemMaintenance.StopAndRepair:Inquiring: Querying Instance Stop and Repair

  • Instance:SystemMaintenance.StopAndRepair:Scheduled: Instance Stop and Repair Scheduled

  • Instance:SystemMaintenance.StopAndRepair:Executing: Stopping and Repairing Instance

  • Instance:SystemMaintenance.StopAndRepair:Executed: Instance Stopped and Repaired

  • Instance:SystemMaintenance.StopAndRepair:Avoided: Instance Stop and Repair Avoided

This system event is triggered 48 to 168 hours before the scheduled time of system maintenance when Alibaba Cloud detects a risk of hardware failure in the underlying host of an instance.

We recommend that you select an appropriate period of time to authorize Alibaba Cloud to repair or redeploy the instance that is equipped with local disks.

SystemMaintenance.CleanInactiveDisks

Disk Cleanup Performed for System Maintenance

Critical

  • Instance:SystemMaintenance.CleanInactiveDisks:Inquiring: Inquiring about Configuration Cleanup of Released Disks

  • Instance:SystemMaintenance.CleanInactiveDisks:executing: Cleaning Up Configurations of Released Disks

  • Instance:SystemMaintenance.CleanInactiveDisks:executed: Configurations Cleaned Up for Released Disks

  • Instance:SystemMaintenance.CleanInactiveDisks:Failed: Configuration Cleanup Failed for Released Disks

This system event is triggered when Alibaba Cloud detects the configurations of one or more disks that were released due to overdue payments in the operating system of an instance.

We recommend that you select an appropriate period of time to stop the instance and authorize Alibaba Cloud to clear the configurations of the disks.

Unexpected O&M events

Event code

Event name

Event level

CloudMonitor event

Event description and impact

Handling suggestion

SystemFailure.Reboot

Instance Restart Due to System Error

Critical

  • Instance:SystemFailure.Reboot:Executing: Restarting Instance Due to System Error

  • Instance:SystemFailure.Reboot:Executed: Instance Restarted Due to System Error

  • Instance:SystemFailure.Reboot:Failed: Instance Restart Failed Due to System Error

This system event is immediately triggered when Alibaba Cloud detects that an instance is restarted due to hardware or software failure in the underlying host, such as CPU or memory hardware damage.

We recommend that you wait until the instance is automatically restarted and then check whether the instance and applications work as expected.

When the instance is being restarted, Alibaba Cloud migrates the instance to a healthy host.

Note

You can modify the maintenance attributes of the instance to specify the default action that takes effect when the instance encounters a maintenance event. For more information, see Modify instance maintenance attributes.

InstanceFailure.Reboot

Instance Restart Due to OS Error

Critical

  • Instance:InstanceFailure.Reboot:Scheduled: Instance Restart Scheduled Due to OS Error

  • Instance:InstanceFailure.Reboot:Executing: Restarting Instance Due to OS Error

  • Instance:InstanceFailure.Reboot:Executed: Instance Restarted Due to OS Error

This system event is immediately triggered when Alibaba Cloud detects that an instance operating system is down due to issues such as out-of-memory (OOM), blue screen, freeze, continuous printing of serial port logs, and kernel panic.

We recommend that you wait until the instance is automatically restarted and then check whether the instance and applications work as expected.

You can enable the kdump service of the operating system to troubleshoot the issue and prevent the issue from recurring. For more information, see Enable the Kernel Memory Dump feature for a Linux instance and Enable the Kernel Memory Dump feature for a Windows instance.

SystemFailure.Stop

Instance Stop Due to System Error

Critical

  • Instance:SystemFailure.Stop:Executing: Stopping Instance Due to System Error

  • Instance:SystemFailure.Stop:Executed: Instance Stopped Due to System Error

This system event is immediately triggered when Alibaba Cloud detects that an instance is stopped due to hardware or software failure in the underlying host, such as CPU or memory hardware damage.

We recommend that you wait until the instance is automatically restarted and then start the instance.

When the instance is being started, Alibaba Cloud migrates the instance to a healthy host.

Note

You can modify the maintenance attributes of the instance to specify the default action that takes effect when the instance encounters a maintenance event. For more information, see Modify instance maintenance attributes.

SystemFailure.Redeploy

Instance Redeployment Due to System Error

Critical

  • Instance:SystemFailure.Redeploy:Inquiring: Querying Instance Redeployment Due to System Error

  • Instance:SystemFailure.Redeploy:Scheduled: Instance Redeployment Scheduled Due to System Error

  • Instance:SystemFailure.Redeploy:Executing: Redeploying Instance Due to System Error

  • Instance:SystemFailure.Redeploy:Executed: Instance Redeployed Due to System Error

  • Instance:SystemFailure.Redeploy:Avoided: Instance Redeployment Avoided Due to System Error

  • Instance:SystemFailure.Redeploy:Canceled: Instance Redeployment Canceled Due to System Error

This system event is immediately triggered when Alibaba Cloud detects hardware or software failure in the underlying host of an instance equipped with local disks and the instance must be redeployed.

Note

Only instances that depend on host hardware support this type of event, such as instances that are equipped with local disks or support Software Guard Extensions (SGX) encrypted computing.

We recommend that you make preparations such as modifying the /etc/fstab configuration file and backing up data, and then take one of the following actions in response to the event:

Note

You can modify the maintenance attributes of the instance to specify the default action that takes effect when the instance encounters a maintenance event. For more information, see Modify instance maintenance attributes.

SystemFailure.Delete

Automatic Cancellation of Bills Due to Instance Creation Failures

Critical

  • Instance:SystemFailure.Delete:Executing: Canceling Order Due to Instance Creation Failure

  • Instance:SystemFailure.Delete:Executed: Order Canceled Due to Instance Creation Failure

  • Instance:SystemFailure.Delete:Avoided: Order Cancellation Avoided Due to Instance Creation Failure

This system event is immediately triggered when Alibaba Cloud detects that an instance creation order is placed but the instance fails to be created.

We recommend that you wait for the instance to be automatically released. In most cases, an instance is automatically released within 5 minutes after the instance fails to be created.

Note

If you already paid for the order, the payment is refunded after the instance is released.

To ensure that instances can be created, we recommend that you take the following actions:

  • Before you create ECS instances in a region and zone, query ECS resource availability and the number of idle private IP addresses in the CIDR block associated with a specified vSwitch in the region and zone. For example, you can call the DescribeAvailableResource operation to query resources in a zone.

  • Use Auto Provisioning or Auto Scaling to flexibly create instances from larger resource pools.

ErrorDetected

Local Disk Fault Alarm

Critical

  • Disk:ErrorDetected:Executing: Local Disk Fault Alarm Started

  • Disk:ErrorDetected:Executed: Local Disk Fault Alarm Ended

This system event is immediately triggered when Alibaba Cloud detects hardware or software failure in the local disk of an instance and data cannot be read from the disk or written to the disk.

We recommend that you make preparations such as modifying the /etc/fstab configuration file and backing up data. Then, select an appropriate point in time to isolate and restore the damaged local disk.

Supported operations vary based on instance types:

  • d1, d1ne, d2s, and d2c: supports online isolation, offline isolation, online repair, and redeployment.

  • d3c, d2c, i2, i2g, i2ne, i2gne, i3, and i3g: supports online isolation, offline isolation, and redeployment.

  • i1: supports redeployment.

  • ebmi2g: supports authorized repair and redeployment.

Note

For more information, see the "Scenario ③" section in O&M scenarios and system events for instances equipped with local disks.

Stalled

Severe Impacts on Disk Performance

Critical

  • Disk:Stalled:Executing: Severe Impacts on Disk Performance Started

  • Disk:Stalled:Executed: Severe Impacts on Disk Performance Ended

This system event is immediately triggered when Alibaba Cloud detects that an I/O hang occurs on a disk of the instance, which significantly affects the disk performance and prevents the disk from handling read and write requests.

We recommend that you isolate reads and writes on the disk at the application layer or disassociate the ECS instance from the associated Server Load Balancer (SLB) instance.

Instance migration events due to upgrades at the underlying layer

Event code

Event name

Event level

CloudMonitor event

Event description and impact

Handling suggestion

SystemUpgrade.Migrate

Instance Migration Events Due to Upgrades at Underlying Layer

Critical

Undefined

This system event is triggered when instances are affected by the upgrades and improvements of physical infrastructure in regions and zones where these instances reside.

We recommend that you view event details in the ECS console and migrate affected instances as prompted. For more information, see Instance migration due to upgrades at the underlying layer.

Performance limited events of burstable instances

Event code

Event name

Event level

CloudMonitor event

Event description and impact

Handling suggestion

Instance:BurstablePerformanceRestricted

Limited Performance of Burstable Instance

Critical

Instance:BurstablePerformanceRestricted: Burstable Instance Performance Limited

This system event is triggered when all accrued CPU credits of a burstable instance are consumed.

We recommend that you take one of the following actions in response to the event:

  • If you want the burstable instance to run at a CPU utilization higher than the baseline for a short period of time, enable the unlimited mode for the instance for that period. For more information, see Switch the performance mode of a burstable instance.

  • If you want the burstable instance to run at a CPU utilization higher than the baseline for a long time, upgrade the instance to a higher-specification instance type or change the instance into a non-burstable instance. For more information, see Overview of instance configuration changes.

If you want to specify thresholds for triggering notifications about this event, for example, if you want an event notification to be sent when accrued CPU credits remain less than 10 for 10 consecutive minutes, you can configure event-triggered alert rules for the event in the CloudMonitor console. For more information, see Monitor burstable instances.

State change events

Event code

Event name

Event level

CloudMonitor event

Event description and impact

Handling suggestion

Instance:PreemptibleInstanceInterruption

Preemptible Instance Interruption

Critical

Instance:PreemptibleInstanceInterruption: Notification on Preemptible Instance Interruption

This system event is triggered 5 minutes before a preemptible instance is reclaimed.

We recommend that you take one of the following actions:

  • Use preemptible instances for stateless applications, such as scalable web services and big data analytics applications.

  • Use Auto Provisioning to deliver instances and mitigate the impacts of reclaimed preemptible instances on your business. You can also implement automated O&M based on this event. For example, you can configure notifications about this event in the CloudMonitor console and have preemptible instances automatically purchased when a notification is sent.

Instance:PerformanceModeChange

Performance Mode Switchover of Burstable Instance

Critical

Instance:PerformanceModeChange: Performance Mode Switchover of Burstable Instance

This system event is triggered when a burstable instance switches between the unlimited mode and standard mode.

We recommend that you determine whether to monitor the event. If you want to monitor the event, you can configure notifications for the event in the CloudMonitor console. For more information, see Configure event notifications.

Instance:StateChange

Instance Status Change

Notification

Instance:StateChange: Notification on Instance Status Change

This system event is triggered when the state of an instance changes, such as from Running to Stopping and from Stopping to Stopped.

We recommend that you determine whether to monitor the event. If you want to monitor the event, you can configure notifications for the event in the CloudMonitor console. For more information, see Configure event notifications.

Instance:AutoReactivateCompleted

Automatic Restart Completed

Notification

Instance:AutoReactivateCompleted: Automatic Restart Completed

This system event is triggered when you settle overdue payments in your account and an instance is automatically reactivated.

We recommend that you determine whether to monitor the event. If you want to monitor the event, you can configure notifications for the event in the CloudMonitor console. For more information, see Configure event notifications.

Instance:LiveMigrationAcrossDDH

Instance Hot Migration Between Dedicated Hosts

Notification

Instance:LiveMigrationAcrossDDH: Instance Hot Migration Between Dedicated Hosts

This system event is triggered when an instance is hot migrated between dedicated hosts.

We recommend that you determine whether to monitor the event. If you want to monitor the event, you can configure notifications for the event in the CloudMonitor console. For more information, see Configure event notifications.

Disk:DiskOperationCompleted

Disk Operations Completed

Notification

Disk:DiskOperationCompleted: Disk Operations Completed

This system event is triggered when a pay-as-you-go disk is manually attached or detached.

We recommend that you determine whether to monitor the event. If you want to monitor the event, you can configure notifications for the event in the CloudMonitor console. For more information, see Configure event notifications.

Disk:ConvertToPostpaidCompleted

Billing Method of Disks Switched to Pay-as-you-go

Notification

Disk:ConvertToPostpaidCompleted: Billing Method of Disks Switched to Pay-as-you-go

This system event is triggered when a subscription disk is changed to a pay-as-you-go disk.

We recommend that you determine whether to monitor the event. If you want to monitor the event, you can configure notifications for the event in the CloudMonitor console. For more information, see Configure event notifications.

Snapshot:CreateSnapshotCompleted

Disk Snapshot Created

Notification

Snapshot:CreateSnapshotCompleted: Disk Snapshot Created

This system event is triggered when a snapshot is created for a disk.

We recommend that you determine whether to monitor the event. If you want to monitor the event, you can configure notifications for the event in the CloudMonitor console. For more information, see Configure event notifications.