Local disks do not provide high availability of data. To enhance user experience on local disks, Alibaba Cloud provides various O&M capabilities to help you keep up on and handle exceptions that occur on your local disks. This topic describes common O&M scenarios and system events for Elastic Compute Service (ECS) instances equipped with local disks.

Common O&M scenarios

For ECS bare metal instances, you can install the xdragon_hardware_detect_plugin plug-in to check the health status of local disks on the instances on a regular basis. For infromation about installing the monitoring plug-in, see Install the monitoring plug-in.

For more information about system events triggered in the scenarios as shown in the preceding figure, see the following sections in this topic:
Note To ensure your business continuity, we recommend that you back up data for affected ECS instances and switch to other instances before you execute O&M tasks on the instances. For example, you can divert traffic away from the affected ECS instances, disassociate the ECS instances from Server Load Balancer (SLB) instances, and back up disk data of the ECS instances.

Scenario ①

Procedure to handle a SystemMaintenance.Reboot system event:
  1. You are notified when an instance is scheduled to be restarted.
  2. Use one of the following methods to handle the event:
    • If you do not want the instance to be restarted within the scheduled time period, specify a different time at which to automatically restart the instance. For more information, see Modify the scheduled restart time.
    • Restart the instance within the user operation window. For more information, see Reboot the instance.
      Note You must restart the instance by using the ECS console or by calling the RebootInstance operation. You cannot restart the instance from within the instance.
    • Wait for the instance to be automatically restarted.
  3. Check whether the instance and applications continue to work as expected.

For information about the event states supported by SystemMaintenance.Reboot, see Summary. For the figure that shows the typical transitions between event states, see States and windows of system events.

Scenario ②

Procedure to handle a SystemMaintenance.Redeploy system event:
  1. You are notified when an instance equipped with local disks is scheduled to be redeployed.
  2. Make preparations such as modifying the /etc/fstab configuration file and backing up data.

    For more information about preparations that you must make, see the "Prerequisites" section in Redeploy an instance equipped with local disks.

  3. Use one of the following methods to handle the event:
    Note When an instance equipped with local disks is redeployed, the instance is migrated to a different physical server, and the local disks of the instance are re-initialized and lose all their data.
  4. Check whether the instance and applications continue to work as expected. If yes, synchronize data based on your business requirements.

For information about the event states supported by SystemMaintenance.Redeploy, see Summary. For the figure that shows the typical transitions between event states, see States and windows of system events.

Scenario ③

Procedure to handle a SystemFailure.Reboot system event:
  1. The system restarts an instance due to a system error.
  2. You are notified when the instance is being restarted.

    Wait until the instance is restarted without manual intervention.

  3. Check whether the instance and applications continue to work as expected.

For information about the event states supported by SystemFailure.Reboot, see Summary. For the figure that shows the typical transitions between event states, see States and windows of system events.

Scenario ④

Procedure to handle a SystemFailure.Redeploy system event:
  1. You are notified when an instance equipped with local disks is scheduled to be redeployed.
  2. Make preparations such as modifying the /etc/fstab configuration file and backing up data.

    For more information about preparations that you must make, see the "Prerequisites" section in Redeploy an instance equipped with local disks.

  3. Use one of the following methods to handle the event:
    Note When an instance equipped with local disks is redeployed, the instance is migrated to a different physical server, and the local disks of the instance are re-initialized and lose all their data.
  4. Check whether the instance and applications continue to work as expected. If yes, synchronize data based on your business requirements.

For information about the event states supported by SystemFailure.Redeploy, see Summary. For the figure that shows the typical transitions between event states, see States and windows of system events.

Scenario ⑤

For Scenario ⑤ where a local disk is damaged on the host of an instance, you can redeploy the instance to another host or replace the disk. Take note of the following items when you replace a damaged disk:
  • Not all disks of local disk instances can be isolated. You can isolate damaged disks only when disk isolation is included in the operations of system events.
  • Disk isolation and disk maintenance are independent of each other. Disk isolation is the premise of disk maintenance, but cannot guarantee the result of disk maintenance. That is, not all instances support local disk maintenance. You can initiate disk maintenance only when you receive the notification of disk restoration from Alibaba Cloud.
  • Redeploying the instance can restore its local disks, but the data stored in the local disks will be lost. For more information, see Redeploy an instance equipped with local disks.
  • When the damaged local disk is replaced, only data of the replaced local disk is lost. The data stored in other local disks on the instance is retained. Procedure to replace a damaged local disk on an instance:
    1. You are notified when a local disk on an instance is damaged and scheduled to be isolated.
    2. Make preparations such as modifying the /etc/fstab configuration file and backing up data.
    3. If the name of the system event contains IsolateErrorDisk, authorize the disk isolation of damaged disks.
    4. If the name of the system event contains Reboot, you must restart the instance.
    5. Alibaba Cloud removes the damaged local disk from the host on which your instance resides, inserts a new disk, and then sends you a disk restoration notification.
    6. If the system event contains disk restoration or related operations, authorize disk restoration.
    7. If the name of the system event contains Reboot, you must restart the instance.
    Note To replace a damaged local disk, you must work together with Alibaba Cloud. For more information, see Isolate damaged local disks in the ECS console and Isolate damaged local disks by using Alibaba Cloud CLI.