When a damaged local disk is isolated, the corresponding ECS instance still resides on the same physical machine. This topic describes how to use Alibaba Cloud CLI to call ECS API operations to isolate damaged local disks. The procedure described in this topic is applicable only to ECS instances on which system events about local disks occur. You can also update the SDKs or call relevant API operations in Alibaba Cloud OpenAPI Explorer to complete the operations.
Prerequisites
Background information
The system event codes that correspond to the isolation options of a damaged disk vary with the stage of events. For more information, see O&M scenarios and system events for instances equipped with local disks.- Before the damaged disk is isolated, the system event code is
SystemMaintenance.IsolateErrorDisk
. If the instance must be restarted, the code isSystemMaintenance.RebootAndIsolateErrorDisk
. - After the damaged disk is isolated but before a new disk is re-initialized, the system event code is
SystemMaintenance.ReInitErrorDisk
. If the instance must be restarted, the code isSystemMaintenance.RebootAndReInitErrorDisk
.Important After data disks are re-initialized, data on the isolated local disk is cleared.
Procedure
- Call the DescribeInstanceHistoryEvents operation to query system events within the specified region that are in the Inquiring state, and record the return values of EventId, DiskId, and Device.Run the following command in Alibaba Cloud CLI:
aliyun ecs DescribeInstanceHistoryEvents \ --RegionId <TheRegionId> \ --InstanceEventCycleStatus.1 Inquiring
Sample response in the JSON format:{ "InstanceSystemEventSet": { "InstanceSystemEventType": [ { "InstanceId": "i-2ze3tphuqvc93ci****3", "EventId": "e-2ze9y****wtqcvai68rl", "EventType": { "Code": 3, "Name": "SystemMaintenance.IsolateErrorDisk" }, "EventCycleStatus": { "Code": 28, "Name": "Inquiring" }, "EventPublishTime": "2017-11-30T06:32:31Z", "ExtendedAttribute" : { "DiskId": "d-disk1", "Device": "/dev/xvda" } } ] }, "PageSize": 10, "PageNumber": 1, "TotalCount": 1, "RequestId": "02EA76D3-5A2A-44EB-****-8901881D8707" }
- Log on to the ECS instance to make preparations before you isolate the damaged local disk.
- Call the AcceptInquiredSystemEvent operation to respond to the specified system event. Run the following command in Alibaba Cloud CLI:
aliyun ecs AcceptInquiredSystemEvent --RegionId <TheRegionId> --EventId <TheEventId>
- Determine whether to restart the instance.
- When the event code is
SystemMaintenance.IsolateErrorDisk
:- If only the RequestId value is returned, you do not need to restart the instance.
- If the return value of
code
is SwitchToOffline.OnlineIsolateFail, you must restart the instance.
- When the event code is
SystemMaintenance.RebootAndIsolateErrorDisk
, you must restart the instance after you call the AcceptInquiredSystemEvent operation.
To restart the instance, run the following command in Alibaba Cloud CLI:aliyun ecs RebootInstance --InstanceId <TheInstanceId>
Note After the instance is restarted, the isolated damaged local disk is temporarily converted to a 1 MiB dummy hard disk to facilitate subsequent operations. At the application layer, you must continuously isolate read and write operations on the damaged local disk and configure the nofail parameter in the /etc/fstab file. - When the event code is
- Wait until Alibaba Cloud replaces the damaged local disk on the physical machine and publishes the
SystemMaintenance.ReInitErrorDisk
orSystemMaintenance.RebootAndReInitErrorDisk
event. This process takes one to five days. - Call the AcceptInquiredSystemEvent operation again to respond to the system event. The local disk enters the re-initializing state.Run the following command in Alibaba Cloud CLI:
aliyun ecs AcceptInquiredSystemEvent --RegionId <TheRegionId> --EventId <TheEventId>
- Determine whether to restart the instance.
- When the event code is
SystemMaintenance.ReinitErrorDisk
:- If only the RequestId value is returned, you do not need to restart the instance.
- If the return value of
code
is SwitchToOffline.OnlineReInitFail, you must restart the instance.
- When the event code is
SystemMaintenance.RebootAndReinitErrorDisk
, you must restart the instance after you call the AcceptInquiredSystemEvent operation.
To restart the instance, run the following command in Alibaba Cloud CLI:aliyun ecs RebootInstance --InstanceId <TheInstanceId>
- When the event code is
What to do next
After the damaged disk is isolated, check the status of the instance and local disk. The replaced local disk is restored to its original capacity, and you can reformat data disks. For more information, see Initialize a data disk up to 2 TiB in size on a Windows instance or Initialize a data disk whose size does not exceed 2 TiB on a Linux instance.