When a damaged local disk is isolated, the corresponding ECS instance still resides on the same physical machine. This topic describes how to use Alibaba Cloud command-line interface (CLI) to call ECS API operations to isolate damaged local disks. The procedure described in this topic is applicable only to resolving system events on ECS instances that are equipped with local disks. You can also update the SDKs or call relevant API operations in Alibaba Cloud OpenAPI Explorer to complete the operations.

Prerequisites

Alibaba Cloud Command-Line Interface (CLI) is installed for the ECS instance. For information about how to install Alibaba Cloud CLI in different operating systems, see the following topics:

Background information

The system event codes corresponding to the isolation options of a damaged disk vary with the stage of events. For more information, see Overview of system events on ECS instances equipped with local disks.
  • Before the damaged disk is isolated: SystemMaintenance.IsolateErrorDisk. If the instance must be restarted, the code is SystemMaintenance.RebootAndIsolateErrorDisk.
  • After the damaged disk is isolated and before a new disk is re-initialized: SystemMaintenance.ReInitErrorDisk. If the instance must be restarted, the code is SystemMaintenance.RebootAndReInitErrorDisk.
    Notice After data disks are re-initialized, the data on the isolated local disk is cleared.

Procedure

  1. Call the DescribeInstanceHistoryEvents operation to query system events that are in the Inquiring state in the specified region, and record the return values of EventId, DiskId, and Device.
    Run the following command in Alibaba Cloud CLI:
    aliyun ecs DescribeInstanceHistoryEvents \
    --RegionId <TheRegionId> \
    --InstanceEventCycleStatus.1 Inquiring
    Sample response in the JSON format:
    {
      "InstanceSystemEventSet": {
        "InstanceSystemEventType": [
          {
            "InstanceId": "i-2ze3tphuqvc93ci****3",
            "EventId": "e-2ze9y****wtqcvai68rl",
            "EventType": {
              "Code": 3,
              "Name": "SystemMaintenance.IsolateErrorDisk"
            },
            "EventCycleStatus": {
              "Code": 28,
              "Name": "Inquiring"
            },
            "EventPublishTime": "2017-11-30T06:32:31Z",
            "ExtendedAttribute" : {
              "DiskId": "d-disk1",
              "Device": "/dev/xvda"
            }
          }
        ]
      },
      "PageSize": 10,
      "PageNumber": 1,
      "TotalCount": 1,
      "RequestId": "02EA76D3-5A2A-44EB-****-8901881D8707"
    }
  2. Log on to the ECS instance to make preparations before you isolate the damaged local disk.
    1. Connect to the ECS instance. For more information, see Overview.
    2. Optional:Isolate the read and write operations of the local disk at the application layer.
    3. For Linux instances, add the nofail parameter to the /etc/fstab configuration file of the ECS instance for the local disk.
      /dev/vdd /mnt/vdd ext4 defaults,barrier=0,nofail 0 0
      Parameter Description
      /dev/vdd The device name of the local disk, which is the value of Device returned by DescribeInstanceHistoryEvents.
      /mnt/vdd The mount point of the local disk, which can be queried through the mount | grep "/dev/vdd" command.
      ext4 The file system type of the instance, which can be queried through the blkid /dev/vdd1 command.
      barrier=0 The mount option to disable barriers in the file system.
      nofail Specifies that the booting sequence of the ECS instance will not be interrupted even if the local disk specified in the file system does not exist.
    4. Detach the local disk.
      umount /dev/vdd
      Notice If you do not detach the local disk, the device name of the local disk will be modified after the local disk is isolated and repaired. In this case, applications may read from or write to another disk.
  3. Call the AcceptInquiredSystemEvent operation to respond to system events.
    Run the following command in Alibaba Cloud CLI:
    aliyun ecs AcceptInquiredSystemEvent --RegionId <TheRegionId> --EventId <TheEventId>
  4. Determine whether to restart the instance.
    • When the event code is SystemMaintenance.IsolateErrorDisk:
      • If only the value of RequestId is returned, you do not need to restart the instance.
      • If code=SwitchToOffline.OnlineIsolateFail is returned, you must restart the instance.
    • When the event code is SystemMaintenance.RebootAndIsolateErrorDisk: After you call the AcceptInquiredSystemEvent operation, you must restart the instance.
    To restart the instance, run the following command in Alibaba Cloud CLI:
    aliyun ecs RebootInstance --InstanceId <TheInstanceId>
    Note After the instance is restarted, the isolated damaged local disk is temporarily converted to a 1 MiB dummy hard disk to facilitate subsequent operations. At the application layer, you must continuously isolate the read/write operations on the damaged local disk and configure the nofail parameter in the /etc/fstab file.
  5. Wait until Alibaba Cloud replaces the damaged local disk on the physical machine and publishes the SystemMaintenance.ReInitErrorDisk or SystemMaintenance.RebootAndReInitErrorDisk event. This process takes one to five days.
  6. Call the AcceptInquiredSystemEvent operation again to respond to the system event. The local disk enters the re-initializing state.
    Run the following command in Alibaba Cloud CLI:
    aliyun ecs AcceptInquiredSystemEvent --RegionId <TheRegionId> --EventId <TheEventId>
  7. Determine whether to restart the instance.
    • When the event code is SystemMaintenance.ReinitErrorDisk:
      • If only the value of RequestId is returned, you do not need to restart the instance.
      • If code=SwitchToOffline.OnlineReInitFail is returned, you must restart the instance.
    • When the event code is SystemMaintenance.RebootAndReinitErrorDisk: After you call the AcceptInquiredSystemEvent operation, you must restart the instance.
    To restart the instance, run the following command in Alibaba Cloud CLI:
    aliyun ecs RebootInstance --InstanceId <TheInstanceId>

What to do next

After the damaged disk is isolated, check the status of the instance and local disk. The replaced local disk is restored to its original capacity, and you can reformat data disks. For more information, see Format a data disk for a Windows ECS instance or Format a data disk for a Linux instance.