If a local disk on a physical machine that hosts an Elastic Compute Service (ECS) instance is damaged, the instance remains on the physical machine after the local disk is isolated. This topic describes how to isolate damaged local disks in the ECS console. The procedure described in this topic can be performed only to handle the local disk-related system events of ECS instances.

Background information

System events for isolation of damaged local disks include the Disk:ErrorDetected, SystemMaintenance.IsolateErrorDisk, SystemMaintenance.RebootAndIsolateErrorDisk, SystemMaintenance.ReInitErrorDisk, and SystemMaintenance.RebootAndReInitErrorDisk events. The Disk:ErrorDetected event is triggered when a damage alert is generated for a local disk. The SystemMaintenance.IsolateErrorDisk event is triggered when a damaged local disk needs to be isolated due to system maintenance. The SystemMaintenance.RebootAndIsolateErrorDisk event is triggered when an instance needs to be restarted and a damaged local disk used by the instance needs to isolated due to system maintenance. The SystemMaintenance.ReInitErrorDisk event is triggered when a damaged local disk needs to be re-initialized due to system maintenance. The SystemMaintenance.RebootAndReInitErrorDisk event is triggered when an instance needs to be restarted and a damaged local disk used by the instance needs to be re-initialized due to system maintenance. Only damaged local disks used by instances of big data instance types can be isolated. For more information, see O&M scenarios and system events for instances equipped with local disks.

Procedure

  1. Log on to the ECS console.
  2. In the left-side navigation pane, click Events.
  3. In the left-side navigation pane of the Events page, click Local Disk-based Instance Events.
  4. On the Local Disk-based Instance Events page, click the Local Disk Damaged Events tab.
  5. Find the instance whose damaged local disk event you want to handle, and click Repair in the Actions column.
  6. In the Configurations Modification step, modify the configuration file of the instance. Then, click Next.
    Configurations ModificationFor some Linux instances, if the Configurations Modification step is displayed, follow the on-screen instructions to perform the following operations. In this topic, a damaged local disk named /dev/vdd is used.
    1. Connect to the ECS instance. For more information, see Connection methods .
    2. Optional:Isolate the read and write operations of the local disk at the application layer.
    3. If the instance is a Linux instance, add the nofail parameter to the /etc/fstab configuration file of the instance for the local disk.
      /dev/vdd /mnt/vdd ext4 defaults,barrier=0,nofail 0 0
      ParameterDescription
      /dev/vddThe device name of the local disk, which is the Device value returned by the DescribeInstanceHistoryEvents operation.
      /mnt/vddThe mount point of the local disk, which can be queried by using the mount | grep "/dev/vdd" command.
      ext4The file system type of the local disk, which can be queried by using the blkid /dev/vdd command.
      barrier=0The mount option used to disable barriers in the file system.
      nofailIndicates that the booting sequence of the ECS instance is not interrupted even if the local disk specified in the file system does not exist.
    4. Unmount the local disk.
      umount /dev/vdd
      Important If you do not unmount the local disk, the device name of the local disk will change after the local disk is isolated and repaired. In this case, applications may read from or write to another disk.
  7. In the Damaged Disk Isolation step, click OK.
    Refresh the page if the next step is not displayed.
  8. Optional:In the Instance Restart step, click Restart.
    If the Instance Restart step is displayed, click Restart to restart the instance.
    Note After the instance is restarted, the isolated damaged local disk is temporarily converted to a 1 MiB dummy hard disk to facilitate subsequent operations. At the application layer, you must continuously isolate read and write operations on the damaged local disk and configure the nofail parameter in the /etc/fstab file.
  9. After the instance is restarted, click OK in the New Disk Inserting step.
    Wait for Alibaba Cloud to replace the damaged local disk on the physical machine that hosts the instance. Typically, five business days are required to replace a damaged local disk. After the local disk is replaced, you receive an event that requires you to restore the disk.
  10. After you receive the event, click Restore in the Disk Restoration step.
    Refresh the page if the next step is not displayed.
  11. Optional:In the Instance Restart step, click Restart.
    If the Instance Restart step is displayed, click Restart to restart the instance.
  12. After the instance is restarted, click Complete in the Complete step.

Result

A few minutes after the damaged local disk is replaced, the local disk damaged event disappears.

What to do next

After the damaged disk is isolated, check the status of the instance and local disk. The replaced local disk is restored to its original capacity, and you can reformat data disks. For more information, see Initialize a data disk up to 2 TiB in size on a Windows instance or Initialize a data disk whose size does not exceed 2 TiB on a Linux instance.