This topic describes how to call ECS API operations to isolate damaged local disks. When a damaged local disk is isolated, the corresponding ECS instance still resides on the same physical machine.

Prerequisites

This topic is applicable only to resolving system events on ECS instances equipped with local disks.

Background information

Based on the stage of events, you can isolate damaged local disks to resolve system events that have the SystemMaintenance.RebootAndIsolateErrorDisk or SystemMaintenance.RebootAndReInitErrorDisk code. For more information, see Overview of system events on ECS instances equipped with local disks.
Notice After the damaged local disk is isolated, data disks are reinitialized and the data on the isolated local disk is cleared.

Procedure

  1. Call the DescribeInstanceHistoryEvents operation to query system events that are in the Inquiring state in the specified region, and record the returned values for EventId, DiskId, and Device. For more information about the operation, see DescribeInstanceHistoryEvents.
    aliyun ecs DescribeInstanceHistoryEvents --RegionId <TheRegionId> --InstanceEventCycleStatus.1 Inquiring
    Sample response in JSON format:
    {
      "InstanceSystemEventSet": {
        "InstanceSystemEventType": [
          {
            "InstanceId": "i-2ze3tphuqvc93ci****3",
            "EventId": "e-2ze9y****wtqcvai68rl",
            "EventType": {
              "Code": 3,
              "Name": "SystemMaintenance.RebootAndIsolateErrorDisk"
            },
            "EventCycleStatus": {
              "Code": 28,
              "Name": "Inquiring"
            },
            "EventPublishTime": "2017-11-30T06:32:31Z",
            "ExtendedAttribute" : {
              "DiskId": "d-disk1",
              "Device": "/dev/xvda"
            }
          }
        ]
      },
      "PageSize": 10,
      "PageNumber": 1,
      "TotalCount": 1,
      "RequestId": "02EA76D3-5A2A-44EB-****-8901881D8707"
    }
  2. Remotely connect to an ECS instance. For more information about detailed steps, see Overview.
  3. (Optional) Isolate the read/write operations of the local disk at the application layer.
  4. For Linux instances: Add the nofail parameter to the damaged local disk in the /etc/fstab file of the ECS instance.
    Example: Add the nofail parameter to the damaged local disk /dev/vdd:
    /dev/vdd /mnt/vdd ext4 defaults,barrier=0,nofail 0 0
    Note
    • /dev/vdd: the device name of the local disk. Enter the actual device name of your local disk.
    • /mnt/vdd: the mount point of the local disk. Enter the actual mount point of your local disk.
    • ext4: the file system type of the instance. Enter the actual file system name of your local disk.
    • barrier=0: a mount option. Disable the barrier setting in the file system by setting barrier to 0.
    • nofail: specifies that the booting sequence of the ECS instance will not be interrupted even if the local disk specified in the file system does not exist.
  5. Call the AcceptInquiredSystemEvent operation to respond to the SystemMaintenance.RebootAndIsolateErrorDisk system event. For more information about the operation, see AcceptInquiredSystemEvent.
    aliyun ecs AcceptInquiredSystemEvent --RegionId <TheRegionId> --EventId <TheEventId>
  6. Call the RebootInstance operation to reboot the ECS instance. For more information about the operation, see RebootInstance.
    aliyun ecs RebootInstance --InstanceId <TheInstanceId>
    Note After the instance is rebooted, check the status of the instance and the local disk. The isolated damaged local disk is temporarily converted to a 1 MiB dummy hard disk to facilitate next operations. At the application layer, you must continuously isolate the read/write operations on the damaged local disk and configure the nofail parameter in the /etc/fstab file.
  7. Wait until Alibaba Cloud replaces the damaged local disks on the physical machine and publishes the SystemMaintenance.RebootAndReInitErrorDisk event. This process takes one to five days.
  8. Call the AcceptInquiredSystemEvent operation again to respond to the SystemMaintenance.RebootAndReInitErrorDisk system event. Then the local disk enters the reinitializing state.
    aliyun ecs AcceptInquiredSystemEvent --RegionId <TheRegionId> --EventId <TheEventId>
  9. Call the RebootInstance operation again to reboot the ECS instance.
    aliyun ecs RebootInstance --InstanceId <TheInstanceId>

What to do next

After the instance is rebooted, check the status of the instance and the local disk. The replaced local disk is restored to its original capacity, and you can reformat data disks. For more information, see Format a data disk for a Windows-based ECS instance or Format a data disk for a Linux-based ECS instance.