System events are defined by Alibaba Cloud to record and notify resource information, such as the execution states of O&M tasks, resource exceptions, and resource state changes.

Note Many Alibaba Cloud services such as Elastic Compute Service (ECS), ApsaraDB RDS, and Server Load Balancer (SLB) support system events. This topic describes ECS system events. For information about system events of other Alibaba Cloud services, see the corresponding documentation.

Limits

Retired instance families do not support the system event feature. For more information, see Retired instance types.

Usage scenarios of system events

  • Notification of risks and exceptions

    After system events that are not displayed in the ECS console are triggered, Alibaba Cloud pushes the events to the ECS console. These events include those that affect the availability and performance of ECS resources, such as SystemMaintenance.Reboot events among scheduled O&M events and InstanceExpiration.Stop events among instance billing events. For specific critical system events, Alibaba Cloud sends additional emails or internal messages. You can handle these events by using the ECS console or by calling API operations. We recommend that you handle the system events as soon as possible to ensure resource availability and performance. For more information, see Query and handle ECS system events.

    For example, when a subscription instance is about to expire, the ECS console prompts you to renew the instance within a specified period of time to ensure service continuity.

  • Automated O&M
    States are defined for system events displayed in the ECS console to help you understand the execution states of system O&M tasks. Meanwhile, new system events and changes in system event states are reported to CloudMonitor so that you can build an event-driven automated O&M system based on your business requirements. For more information about event states, see the States and windows of system events section in this topic.
    Note Each event state corresponds to a CloudMonitor event. For example, the Executing and Executed states that the InstanceFailure.Reboot ECS event code supports correspond to the Instance:InstanceFailure.Reboot:Executing and Instance:InstanceFailure.Reboot:Executed CloudMonitor events.

    Some state change events are not displayed in the ECS console and cannot be handled by using the ECS console or by calling API operations. Examples: events that indicate instance state changes or interruptions of preemptible instances. No states are defined for these system events. However, these events are still reported to CloudMonitor when they are triggered so that you can build an event-triggered automated O&M system based on your business requirements.

    For example, state change events are triggered when you manually start or stop instances. These events do not indicate risks or exceptions. If you want to log your operations to your system, you can configure event notifications for state change events and use the alert callback feature to write the startup and stop information of instances to operation logs.

System event categories

System events are defined by Alibaba Cloud to record and notify resource information. System events are classified into the categories described in the following table based on event causes:
Note For information about system event categories that ECS supports and how to handle ECS system events, see Summary.
CategoryDescriptionDisplayed in the ECS console
Scheduled O&M eventsAlibaba Cloud may need to upgrade host software for security reasons or to foresee and take actions against failure risks that lie in underlying host hardware and software. In these cases, if O&M tasks to be executed by Alibaba Cloud may affect the availability or performance of your ECS resources, Alibaba Cloud triggers and sends scheduled O&M events in advance to notify you of task details such as execution times, objects, and impacts. After you receive a scheduled O&M event, you can handle it during an off-peak period within the event execution window to minimize the impact on your business.
Note Scheduled O&M events, also known as proactive O&M events, are based on the O&M experience on millions of servers of Alibaba Cloud, the ability to serve tens of thousands of large enterprise customers, and the cutting-edge machine learning algorithms of Alibaba DAMO Academy to foresee and take actions against failure risks that lie in underlying host hardware or software. When failure risks on the host cannot be prevented, Alibaba Cloud will notify the affected ECS users in advance by using scheduled O&M events to give the users time to switch their business. If you do not respond to scheduled O&M events in advance, ECS instances may break down or restart when failures occur.
Yes
Note When scheduled O&M events are triggered for instances of big data instance families (excluding the d3c instance family) or instance families that are equipped with local SSDs (excluding the i4p instance family), the events are displayed on the Local Disk-based Instance Events page. For more information about local disk-based instance events, see O&M scenarios and system events for instances equipped with local disks.
Unexpected O&M eventsThis category of system events is triggered when ECS instances restart or break down due to unexpected issues such as kernel panic, out-of-memory errors, or hardware or software failures in underlying hosts. Alibaba Cloud sends these events as soon as they are triggered and restores affected ECS resources as soon as possible. At the same time, Alibaba Cloud notifies you of the execution states of system O&M tasks related to the events.
Note In most cases, unexpected O&M events refer to sudden downtime or restarts of ECS instances due to unpredictable failures of the underlying hosts, or kernel errors in the operating systems of the ECS instances.
  • ECS instance downtime or restart events caused by host failures (SystemFailure.Reboot) are occasional and inevitable. If the Service Level Agreement (SLA) for a single instance is violated, Alibaba Cloud will pay compensation according to the SLA of the related service.
  • ECS instance restart events caused by operating system kernel errors (InstanceFailure.Reboot) are caused by applications in most cases. You can capture dump files to analyze the causes. For more information, see Enable the Kdump service for a Linux instance.
Yes
Note When unexpected O&M events are triggered for instances of big data instance families (excluding the d3c instance family) or instance families that are equipped with local SSDs (excluding the i4p instance family), the events are displayed on the Local Disk-based Instance Events page. For more information about local disk-based instance events, see O&M scenarios and system events for instances equipped with local disks.
Local disk-based instance eventsThis category of system events includes system events that are triggered for local disks (such as system events triggered when local disks are damaged) and system events that are triggered for instances equipped with local disks (such as system events triggered when instances equipped with local disks fail due to local disk damages or when the hardware or software of underlying hosts fails for instances equipped with local disks).
Note Local Disk-based Instance Events are technically not a system event category and are used only to display scheduled or unexpected O&M events for instances of big data instance families (excluding the d3c instance family) or instance families that are equipped with local SSDs (excluding the i4p instance family) and make these events easy to handle. For more information about local disk-based instance events, see O&M scenarios and system events for instances equipped with local disks.
Yes
Burstable instance performance limited eventsThis category of system events is triggered when burstable instances exhaust their CPU credits and start to run at or near the baseline CPU utilization. This may affect instance management, instance O&M, and the operation of applications and result in issues such as slow access and delays. Yes
Instance security eventsThis category of system events is triggered when instances face security threats. For example, instance security events are triggered when instances suffer DDoS attacks or when blackhole filtering is triggered for instances. Yes
Instance migration events due to upgrades at the underlying layerThis category of system events is triggered when your instances need to be migrated from specified regions and zones due to an infrastructure upgrade plan of Alibaba Cloud. You can migrate your instances based on these system events. Yes
State change eventsThis category of system events is triggered when operations (such as Start and Stop) on instances cause their lifecycle states to change or when instance attribute changes cause instance lifecycle states or other states to change. State change events are classified into the following categories:
  • Lifecycle state change events: For example, lifecycle state change events are triggered when instances enter a different state, when preemptible instances are interrupted, and when snapshots are created.
  • Other attribute change events: For example, other attribute change events are triggered when the performance mode of burstable instances is changed or when subscription disks are changed into pay-as-you-go disks.
  • Lifecycle state change events are not displayed in the ECS console.
  • Specific other attribute change events are displayed in the ECS console.

System event severities

System events are assigned the following severities based on their impacts on the normal operation of instances:
  • Critical: Critical system events may result in instance unavailability and must be handled as soon as possible. For example, a critical system event is triggered when resources are released due to an overdue payment or when an instance is redeployed due to an instance error.
  • Warning: Warning system events have impacts on your business. For example, a warning system event is triggered when a burstable instance cannot burst above its performance baseline. You must pay close attention to these events or handle them when appropriate.
  • Notification: Notification system events do not affect your business. For example, a notification system event is triggered when a snapshot is created for a disk. You can optionally pay attention to notification system events.

States and windows of system events

The following table describes the states defined for system events that are displayed in the ECS console.
Note For information about the states that different system events support, see the "CloudMonitor event" columns of tables in Summary.
Event stateAttributeDescription
Inquiring IntermediateThe O&M task related to the system event is pending authorization. After you authorize the task to be executed, the event enters the Executing state.
ScheduledIntermediateThe O&M task related to the system event is scheduled and pending execution. When the O&M task is executed, the event enters the Executing state.
ExecutingIntermediateThe O&M task related to the system event is being executed.
ExecutedStableThe O&M task related to the system event is completed.
AvoidedStableThe impacts of the system event are prevented because you have migrated the affected instance within the user operation window.
FailedStableThe O&M task related to the system event failed.
CanceledStableThe O&M task related to the system event is automatically canceled.
System events have the following windows:
  • User operation window
    The user operation window of a system event starts when the event is sent and ends at the time when the related O&M task is executed as scheduled. You can manually execute the O&M task within the user operation window or wait for the system to automatically execute the task. Take note of the following items about the lengths of user operation windows:
    • In most cases, the user operation window of a scheduled O&M event ranges from 24 to 48 hours.
      Note The lengths of user operation windows are unlimited for system events in the Inquiring state. The O&M tasks related to the events can start only after you authorize the tasks to be executed.
    • Typically, unexpected O&M system events caused by failures or unauthorized operations do not have a user operation window.
    • For system events indicating that subscription instances are about to expire, the window is three days.
    • For system events indicating that pay-as-you-go instances are to be stopped due to overdue payments, the window is less than 1 hour.
  • Event execution window
    The execution window of a system event starts when the related O&M task is executed and ends when the task is completed. Take note of the following items about the lengths of event execution windows:
    • For system events such as failure recovery events, the window is within 10 minutes.
    • Unexpected O&M events caused by failures or unauthorized operations have a short event execution window.

Formats of ECS event codes and CloudMonitor event names

ECS event codes and CloudMonitor events follow specific naming conventions for easy understanding.
  • ECS event codes are in the <Event cause>.<Event impact> format to indicate event causes and impacts on resources.
  • CloudMonitor event names are in the <Resource type>:<Event cause>.<Event impact>:<Event state> format to indicate resource types, event causes, event impacts on resources, and event states.
Note ECS event codes and CloudMonitor event names may include only some of the preceding information. For example, a CloudMonitor event name of Disk:ErrorDetected:Executing indicates that a disk is damaged, and excludes information about the impacts on resources.
The following table describes some examples of ECS event codes and CloudMonitor event names.
Note An ECS event code of Undefined indicates that ECS events are not displayed in the ECS console and cannot be handled by using the ECS console or by calling API operations. For more information about ECS system events, see Summary.
CategoryExample ECS event codeExample CloudMonitor event nameDescription
Scheduled O&M eventsSystemMaintenance.RebootInstance:SystemMaintenance.Reboot:Inquiring
  • Resource type: Instance indicates ECS instance.
  • Event cause: SystemMaintenance indicates that Alibaba Cloud proactively initiates a system O&M task.
  • Event impact: Reboot indicates that the instance is restarted while the O&M task is being executed.
  • Event state: Inquiring indicates that the O&M task related to the event is pending authorization and the instance can be restarted only after you authorize the task to be executed.
Unexpected O&M eventsErrorDetectedDisk:ErrorDetected:Executing
  • Resource type: Disk indicates cloud disk.
  • Event cause: ErrorDetected indicates that the local disk is damaged.
  • Event state: Executing indicates that the damaged local disk has not been repaired.
Lifecycle state change eventsSnapshot:CreateSnapshotCompletedSnapshot:CreateSnapshotCompleted
  • Resource type: Snapshot indicates snapshot.
  • Event cause: CreateSnapshotCompleted indicates that the snapshot is created.

Operations that can be performed on system events

OperationDescription and references
Understand system eventsTo learn about system events and understand the event codes, names, severities, usage scenarios, limits, states, and name formats, see this topic.
View system eventsYou can view system events by using the ECS console, CloudMonitor console, or Alibaba Cloud CLI.
Handle system eventsFor some critical system events (such as system events that affect the availability and performance of ECS resources), we recommend that you handle the events as suggested by using the ECS console or by calling API operations as soon as possible to ensure service availability.
Monitor system eventsTo ensure the stability of services that run on ECS instances and automate O&M, we recommend that you configure event notifications to get notified of underlying environment changes. After event notifications are configured, the system uses your specified notification methods to send you notifications.
Modify system event-related settingsYou can modify system event-related settings based on your business requirements.
  • You can modify the maintenance attributes of an instance to configure whether to restart or redeploy the instance after a system event is handled. For more information, see Modify instance maintenance attributes.
  • For scheduled system events that require instances to be restarted, you can set the time when to restart the instances after the scheduled system events are handled. For more information, see Modify the scheduled restart time.