All Products
Search
Document Center

Automatic recovery of instances

Last Updated: Feb 11, 2019

This topic describes how the automatic recovery of instances can improve your efficiency and overall experience in ECS.

What is automatic recovery

In the event that the underlying hardware on that your ECS instances are hosted crashes unexpectedly, your instances will restart automatically after an unplanned maintenance window if the breakdown is confirmed as irreversible and your instances cannot be fixed. In such case, all of the instance metadata of your recovered instances, including the instance ID and the private and public IP addresses, remains unchanged.

Automatic recovery is a type of system event with the event code of SystemFailure.Reboot. Automatic recovery differs from other types of system events, such as live-migration, in that live-migration requires no manual intervention, whereas other system events may require manual configurations during the scheduled maintenance window.

Limits

  • You cannot restart an instance manually if the automatic recovery is scheduled.

  • The instances with local disks or ephemeral disks can be recovered automatically only if the underlying hardware can be restarted after an unexpected crash. If you cannot recover your instances with local disks, you can open a ticket immediately to check if the data on the disk is retained or if the instance has been redeployed to another physical server.

View an instance automatic recovery event

In this example, the DescribeInstancesFullStatus method is called by using the Alibaba Cloud CLI.

  1. aliyun ecs DescribeInstancesFullStatus --RegionId <TheRegionId> --InstanceId.1 <YourInstanceId> --output cols=EventId,EventTypeName

Note: If the EventTypeName returns the event code SystemFailure.Reboot, then it is the case that automatic recovery is scheduled for your instance.

For more information about how to use the ECS console, see System event, and for instructions on using other developer tools, see Quick starts for ECS APIs.

Suggestions for greater fault tolerance

To fully leverage the capability of automatic recovery of your instances and failover operations, make sure that you complete the following actions:

  • Add your applications, for example, SAP HANA, into the startup item list to avoid any interruptions to your business operations.

  • Activate the automatic reconnection feature of your applications. For example, enable your applications to automatically connect to MySQL, SQL Server, or Apache Tomcat.

  • For Server Load Balancer users, deploy multiple ECS instances in a cluster environment, so that, when one of your ECS instances is in the automatic recovery process, your other ECS instances can continue to access your services.

  • Periodically backup the data on the local disks for data redundancy and possible instance redeployment.