This topic describes the O&M process of Alibaba Cloud and best practices for users when a system event occurs on an ECS instance equipped with local disks.

Common O&M scenarios

For more information about the types of local disks used by ECS, see Local disks. The following three scenarios where underlying failures occur are common to instances equipped with local disks:

  • Scenario 1: An instance exception occurs due to a software problem on the physical machine that hosts the instance
    • Impact: Typically, the physical machine recovers after it is rebooted. This is an unexpected rebooting for the instance.
    • What to do next: none.
  • Scenario 2: An instance exception occurs due to a damage to a local disk
    • Impact: Typically, the physical machine that hosts the instance recovers after it is rebooted, but the damaged local disk cannot recover.
    • What to do next: You need to select the method of replacing the damaged local disk. For more information, see System events on ECS instances equipped with local disks.
  • Scenario 3: An instance exception occurs due to a damage to other hardware on the physical machine
    • Impact: Typically, the physical machine needs to be taken offline for repair.
    • What to do next: You need to redeploy the instance equipped with local disks and migrate the instance to a different physical machine. Synchronize data as needed to restore the instance and the local disks.

The following figure shows the common O&M scenarios of instances equipped with local disks.

Common O&M scenarios of instances equipped with local disks

System events on ECS instances equipped with local disks

When an instance equipped with local disks is running and Alibaba Cloud detects a physical exception on one of its local disks, Alibaba Cloud will send you a system event specific to Block Storage indicating that the local disk is abnormal. The code of the Block Storage event is ErrorDetected. During the event window period, you can select one of the following solutions:
  • Redeploy an instance equipped with local disks

    If you need to urgently restore a local disk and can accept the loss of data on the local disk, you can migrate the instance to a different physical machine to restore the capacity of all data disks, and remount and reformat data disks. The codes for the system events that require instances equipped with local disks to be redeployed are SystemMaintenance.Redeploy and SystemFailure.Redeploy. For more information, see Redeploy an instance equipped with local disks.

  • Isolate damaged local disks

    Alibaba Cloud will replace isolated local disks as soon as possible. After the local disks are replaced, Alibaba Cloud sends you the system event that requires the instance to be rebooted and the damaged local disk to be replaced. You can respond to the event within the event window period. Based on the stage of events, you can isolate damaged local disks to resolve system events that have the SystemMaintenance.RebootAndIsolateErrorDisk or SystemMaintenance.RebootAndReInitErrorDisk code. For more information, see Isolate damaged local disks.

    The following figure shows the workflow of isolating a damaged disk and corresponding event states.Workflow of isolating a damaged disk and corresponding event states

Related operations