Disaster recovery solutions help ensure the running stability and security of your services and IT systems by incorporating data backup and disaster recovery. Alibaba Cloud ECS allows you to use snapshots and images to back up data.

Disaster recovery methods

  • Snapshot backup

    Alibaba Cloud ECS allows you to back up system disks and data disks with snapshots. Alibaba Cloud provides the Snapshot 2.0 service, which features a higher snapshot quota and a more flexible automatic task strategy than previous snapshot services, to help reduce the impacts on business I/O. When snapshots are used for data backup, the first snapshot of a disk is a full backup, and subsequent snapshots are incremental backups. Incremental snapshots can be created quickly and have small sizes. The amount of time required for backup depends on the amount of incremental data to be backed up.

    Note Snapshots are created on an incremental basis. To improve backup speed, we recommend that you create a new snapshot before deleting the most recent one.

    The preceding figure shows how incremental snapshots work. In the figure, Snapshots 1, 2, and 3 represent the first, second, and third snapshot of a disk. The file system checks the disk data block by block. When a snapshot is created, only the blocks with changed data are copied to the snapshot. Alibaba Cloud ECS allows you to configure manual or automatic snapshots of disks. To create automatic snapshots of a disk, you can configure and apply an automatic snapshot policy to the disk. You can specify the hour of the day (on the hour), day of week (Monday through Sunday), and retention period for snapshot creation in the policy. You can customize the retention period to a value from 1 to 65,536 days, or choose to save snapshots permanently.

  • Snapshot rollback
    When exceptions occur in your system and you want to roll a disk back to a previous state, you can roll the disk back to a created snapshot. For more information, see Roll back a disk. Note the following points:
    • Rollback operations are irreversible. After a rollback is complete, data before the rollback cannot be restored. Exercise caution when you perform this operation.
    • When a disk is rolled back, all data created or modified between the current time and the snapshot creation time is lost.
  • Image backup

    An image works as a copy that stores data from one or more disks. An ECS image may store data from a system disk or from both system and data disks. All image backups are full backups and can only be triggered manually.

  • Image recovery

    You can create a custom image from a snapshot to include the operating system and data environment of the snapshot in the image. Then, you can use the custom image to create multiple instances with the same operating system and data environment. For more information about the configuration of snapshots and images, see Create a normal snapshot and Create a custom image from a snapshot.

    Note Custom images cannot be used across regions.

Technical metrics

RTO and RPO are related to the amount of data, typically on an hourly basis.

Scenarios

  • Backup and recovery

    Alibaba Cloud ECS allows you to back up system disks and data disks with snapshots and images. If incorrect data is stored on a disk due to application errors or hackers' malicious access through application vulnerabilities, you can use the snapshot service to restore the disk to a desired state. In addition, Alibaba Cloud ECS allows you to reinitialize disks with images or create ECS instances from custom images.

  • Disaster recovery

    Alibaba Cloud ECS supports the implementation of disaster recovery architecture. For example, you can buy and use an SLB instance at the frontend of an application, and deploy at least two ECS instances at the backend of the same application. Alternatively, you can use Auto Scaling provided by Alibaba Cloud to perform auto scaling by defining how to use ECS resources. This way, even if one of the ECS instances fails or is overloaded, disaster recovery can be implemented to ensure business continuity. The following figure provides an example in which ECS instances are deployed in data centers in two zones within the same region. All communications are implemented in the Alibaba Cloud Gigabit internal network to ensure fast response and reduce Internet traffic costs.

    • SLB: SLB instances are used for load balancing between the two zones. Traffic is distributed to two or more data centers where ECS instance clusters are deployed.
    • ECS cluster: ECS instances deployed in the two data centers are equivalent. The failure of a single instance does not affect data layer applications and the ECS control function. If a failure occurs, the system automatically performs hot migration so that other ECS instances can continue to provide services. This can prevent service interruptions caused by a single point of failure or hot migration failures. If hot migration fails, you will receive a notification about the failures based on system events so that you can deploy new nodes in a timely manner.
    • Data layer: OSS is deployed at the region level. ECS nodes in data centers in different zones can access objects in OSS. For database applications, multi-zone ApsaraDB for RDS service is used. Primary nodes can perform read and write operations across zones without conflicting with application-layer traffic. In addition, secondary nodes can perform read operations across zones to prevent inability of ECS instances to read data in case of failures of the primary nodes.