System events are scheduled underlying O&M events or unexpected maintenance events that affect the running status of ECS instances. System events occur when ECS instances are restarted, stopped, or released for any reason.

Differences between routine maintenance and system events

ECS provides routine maintenance for underlying physical servers and forestall potential failures to improve the reliability, performance, and security of ECS instances. When a failure or threat is detected in a physical server, ECS uses hot migration to migrate ECS instances from the at-risk server to a healthy server and ensures that the instances continue to run properly. These operations are called routine maintenance. During routine maintenance, the system does not send notifications and the instances are not affected.

For system events, ECS will send you notifications that contain information such as solutions and event cycles. For system O&M events, ECS will notify you of the impacts of the system events on your instances and the scheduled execution points in time. You can back up data in a timely manner and make preparations on the application layer before you handle system events. You can also query system events that were handled within the last week and obtain data for troubleshooting and analysis.

Limits

Phased-out instance families do not support the system event feature.

System event types

The following table describes the types, impacts, and recommended solutions of ECS system events.

Impact Event type Parameter Recommended solution
Instance restart Instance restart due to scheduled system maintenance SystemMaintenance.Reboot Select an appropriate point in time during the user operation window to perform the following operations:
  1. Restart the ECS instance.
    Note You must restart the instance by using the ECS console or by calling the RebootInstance operation. You cannot restart the instance from within the instance. For more information, see Reboot the instance.
  2. Divert traffic away from the instance that is to be restarted or remove the instance from the backend server group of the Server Load Balancer (SLB) instance to avoid impacts on your business.
  3. Optional. Create snapshots for disks that are attached to the instance to back up data.
Unexpected instance restart Instance restart due to an unexpected system error SystemFailure.Reboot For more information, see Automatic recovery events of instances.
Instance restart due to an unexpected instance error InstanceFailure.Reboot If you receive an event notification during or after the instance restart, we recommend that you perform the following operations:
  • View system logs and screenshots to troubleshoot the failure and find the cause of the instance failure to prevent further failures. For more information, see System logs and screenshots.
  • Check whether the instance and applications have recovered.
Instance redeployment Instance redeployment due to scheduled system maintenance SystemMaintenance.Redeploy For more information, see Redeploy an instance equipped with local disks.
Instance redeployment due to a system error SystemFailure.Redeploy For more information, see Redeploy an instance equipped with local disks.
Instance stop Subscription instance: instance stop upon expiration InstanceExpiration.Stop For more information about how to renew a subscription instance, see Overview.
Pay-as-you-go instance: instance stop due to overdue payments AccountUnbalanced.Stop

We recommend that you maintain a sufficient balance for your payment account to prevent instances from being stopped due to insufficient account balance.

Instance release Subscription instance: instance release upon expiration InstanceExpiration.Delete For more information about how to renew a subscription instance, see Overview.
Pay-as-you-go instance: instance release due to overdue payments AccountUnbalanced.Delete

We recommend that you maintain a sufficient balance for your payment account to prevent instances from being stopped due to insufficient account balance.

Instance release Instance release due to a creation failure SystemFailure.Delete For more information, see System event of instance creation failure.
Impacts on disk performance Severe impacts on disk performance Stalled On the application layer, isolate read and write operations on the disk or temporarily remove the ECS instance from the backend server group of the SLB instance.
Damages to local disks Damages to local disks ErrorDetected For more information, see Overview of system events on ECS instances equipped with local disks.
Instance restart and isolation of the damaged local disk Instance restart due to scheduled system maintenance and replacement of the damaged local disk SystemMaintenance.RebootAndIsolateErrorDisk For more information, see Isolate damaged local disks by using Alibaba Cloud CLI.
Instance restart and restoration of the damaged local disk Instance restart due to scheduled system maintenance and re-initialization of the damaged local disk SystemMaintenance.RebootAndReInitErrorDisk For more information, see Isolate damaged local disks by using Alibaba Cloud CLI.
Disk detachment error Removal of residual disks due to system maintenance SystemMaintenance.CleanInactiveDisks Log on to the ECS console, view pending events, and handle the event as instructed. For more information, see View system events.
Limited performance of burstable instances The performance of burstable instances is below the baseline performance due to insufficient available CPU credits. N/A You can use one of the following methods to handle the event:

System event status

The following table describes the status of a system event during its lifecycle.

Status Status attribute Description
Inquiring Intermediate status The system event is in the Inquiring state and waiting for your confirmation. The event enters the Executing state after it is confirmed.
Scheduled Intermediate status The system event has been scheduled but not performed.
Avoided Stable status The system event is responded to in advance within the user operation window.
Executing Intermediate status The system event is being executed.
Executed Stable status The system event has been executed.
Canceled Stable status The scheduled system event has been canceled.
Failed Stable status The system event failed.

System event windows

System events have the following windows:

  • User operation window: the period between the time when a system event is initiated and the time when the system event is executed. You can select the recommended method based on the impacts of the system event on your business to handle the event in advance or wait until the default actions are triggered. If ECS fixes a system event triggered by a system failure, ECS will send you an event notification in advance based on the system maintenance schedule.

    The following section describes the duration of the user operation window:

    • For events in the Inquiring state, no time limit exists.
    • For maintenance-related system events, the window is 24 hours to 48 hours.
    • For subscription instances that are to be stopped due to expiration, the window is three days.
    • For pay-as-you-go instances that are to be stopped due to overdue payments, the window is less than one hour.
    • Instances in which a system event occurs due to billing issues are immediately stopped and will be released in 15 days if the issue is not resolved.
    • Unexpected system events that are caused by failures or invalid operations do not have a user operation window.
  • Event execution window: the period between the time when the system event is responded to and the time when the event execution is completed. The system will send you the execution result.

    The following section describes the duration of the event execution window:

    • For system events such as failure recovery, the window is within 10 minutes.
    • Unexpected system events that are caused by failures or invalid operations have a short event execution window.
Event execution window