All Products
Search
Document Center

Cloud Backup:Overview

Last Updated:Sep 22, 2023

This topic describes the basic capabilities and benefits of the async replication feature for disaster recovery.

Overview

Cloud Backup implements cross-region and cross-zone disaster recovery based on the async replication feature to meet different business requirements.

Async replication is implemented on disks without the need to install an agent on the protected instance.

If a fault occurs on the primary system, the business system is switched to the disaster recovery system. This effectively prevents system failures caused by regional disasters, ensures business availability, and meets the recovery point objective (RPO) and recovery time objective (RTO) goals of your business.

Async replication is a feature that protects data across regions or across zones within the same region based on the data replication capability of Elastic Block Storage (EBS). For more information, see Overview.

The following table describes the differences between continuous data replication (CDR) and async replication.

Item

CDR

Async replication

Application scenarios

Disaster recovery for a single virtual machine (VM). The target customers are those who have strict RPO requirements and do not mind intrusions into the system.

Disaster recovery that ensures the consistency of VM groups. The target customers are those who can accept an RPO of a few minutes and do not expect intrusions into the system.

System intrusive

Yes

No

Replication implementation

An agent is installed on the operating system of the protected instance, so that Cloud Backup replicates data written into the disks and sends the data to a gateway in real time. The gateway then transmits the data to the Object Storage Service (OSS) bucket for storage on the disaster recovery site.

Data is replicated by using the async replication and snapshot features.

Recovery implementation

Supports multiple recovery points.

A shadow Elastic Compute Service (ECS) instance and a gateway server are created for the protected ECS instance at the disaster recovery site. Cloud Backup reads data from the OSS bucket to the shadow ECS instance, writes the data to the ECS instance at the disaster recovery site, and then creates a recovery point based on the snapshot mechanism.

Supports only a single recovery point.

Cloud Backup creates a recovery point by replicating the snapshot to the disaster recovery site.

Consistency group

Not supported

Supported

Benefits of disaster recovery

Agentless replication

Async replication does not require agents, does not intrude into the system, is universally applicable to operating systems, and does not consume computing resources at the disaster recovery site.

Multi-VM consistency

Disaster recovery provides multi-VM consistency to meet the high requirements for enterprise applications.

Ease of use

After you create a protection group for an application, you can add all the ECS instances of the application to the protection group and enable replication. You do not need to focus on the mappings between disks and ECS instances. ECS instances and disks are mapped by Cloud Backup.

Terms

Term

Description

site pair

Cross-region and cross-zone disaster recovery is implemented based on async replication. Async replication is used to replicate data from one site to another site across regions or across zones in a region. Therefore, you have to pair two sites according to your business requirements. These two sites are referred to as a site pair. Protection groups must be created for the site pair. Disaster recovery is implemented only in the forward direction for the protection groups in a site pair. For example, disaster recovery is performed from Protection Group A to Protection Group B, and the forward protection is initiated from Region 1 to Region 2. Disaster recovery is performed from Protection Group C to Protection Group D, and the forward protection is initiated from Region 2 to Region 1. In this case, you must create two site pairs. A protection group can belong to only one site pair.

Only one replication technology can be used for one site pair.

protection group

  • A protection group can contain multiple ECS instances. This way, you can use one plan to perform operations on multiple ECS instances at the same time. You can select the common type (no associations exist between multiple VMs) or the consistency group type.

  • Only one underlying technology can be applied to the ECS instances in a protection group to implement disaster recovery: CDR or async replication. You must determine the underlying technology when you create a protection group.

  • The normal states of a protection group include Starting Replication, Replicating Full Data, Replicating Incremental Data, Failover in Progress, Failover Completed, Reverse Replicating, Failback in Progress, and Failback Completed. The abnormal states include Replication Error, Failover Failed, and Failback Failed.

  • A failover is performed for all the protected ECS instances in a protection group. Therefore, the role of all the protected ECS instances in a protection group must be the same.

protected instance

An ECS instance or database that is protected by Cloud Backup. Database protection will be supported in the future. Roles are classified into primary and secondary roles. Primary roles refer to the instances on which services are running, and secondary roles refer to the instances that are currently used for disaster recovery.

production site

The zone or region where your production business operates initially.

disaster recovery site

The zone or region for disaster recovery of your production business.

failover

The process of switching services to the disaster recovery site when a fault occurs at the production site. Failover is classified into planned failover and unplanned failover. The difference lies in whether the ECS instance at the production site fails during the switchover.

failback

The process of switching services from the disaster recovery site to the production site when the fault at the production site is rectified.

forward protection

The replication direction of the protection group and ECS instances. In forward protection, data and services are replicated from the production site to the disaster recovery site.

reverse protection

The replication direction of the protection group and ECS instances. After a failover, the disaster recovery site (Site B) becomes the primary site, and the production site (Site A) becomes the secondary site. In this case, after the protection is enabled, data is replicated from Site B to Site A. The process is called reverse protection. After the fault is rectified, Site A becomes the production site and Site B becomes the disaster recovery site again. In this case, after the protection is enabled, data is replicated from Site A to Site B. The process is called forward protection.

Supported disaster recovery scenarios

Disaster recovery scenario

Type

Failover

  • Switch After Data Synchronization

    During the failover, Cloud Backup stops the protected instances in the protection group, and performs the final data synchronization after all the protected instances are stopped. The failover starts after the data is synchronized. This ensures that the data at the disaster recovery site is the same as that at the production site. This type of failover applies to scenarios such as planned disaster recovery drills and business migration.

  • Switch Now

    During the failover, Cloud Backup attempts to stop the protected instances in the protection group. Cloud Backup does not wait until all the protected instances are stopped or perform the final data synchronization. Some data may be lost within the RPO range. This type of failover applies to scenarios where a fault cannot be rectified within a short period of time at the production site and business must be immediately switched to the disaster recovery site.

Failback

  • Switch After Data Synchronization

    During the failback, Cloud Backup stops the protected instances in the protection group, and performs the final data synchronization after all the protected instances are stopped. The failback starts after the data is synchronized. The service unavailability time is longer than the time for the immediate failback. The production site works properly in such failback scenarios.

  • Switch Now

    During the failback, Cloud Backup attempts to stop the protected instances in the protection group. Cloud Backup does not wait until all the protected instances are stopped or perform the final data synchronization. This type of failback applies to scenarios where a fault cannot be rectified within a short period of time at the disaster recovery site and business must be immediately switched to the production site. During the failback, some data may be lost.

Disaster recovery process

To implement disaster recovery protection for critical applications in the Hybrid Backup Recovery (HBR) console, perform the following steps:

  • Step 1: Plan resources.

    Before you perform disaster recovery, you must plan the required compute, network, and storage resources. You must determine the number of servers, storage capacity, and virtual private clouds (VPCs).

  • Step 2: Create a disaster recovery site pair.

    Create VPCs and vSwitches for the disaster recovery site, and configure CIDR blocks. During the test, you can use the default configurations to create VPCs and vSwitches. You can also configure the same VPC CIDR block and vSwitch CIDR block for the production site and the disaster recovery site. During actual disaster recovery, you can configure CIDR blocks as required.

  • Step 3: Configure network and security settings.

    Create resource mappings, including the zone mapping, vSwitch mapping, and security group mapping.

  • Step 4: Create a protection group.

  • Step 5: Add protected instances.

    Add instances to be protected.

  • Step 6: Start replication.

    Start disaster recovery protection, a process of replicating data from the production site to the disaster recovery site.

  • Step 7: Perform a failover.

    • Switch After Data Synchronization

      During the failover, HBR stops the protected instances in the protection group, and performs the final data synchronization after all the protected instances are stopped. The failover starts after the data is synchronized. This ensures that the data at the disaster recovery site is the same as that at the production site. This type of failover applies to scenarios such as planned disaster recovery drills and business migration.

    • Switch Now

      During the failover, HBR attempts to stop the protected instances in the protection group. HBR does not wait until all the protected instances are stopped or perform the final data synchronization. Some data may be lost within the recovery point objective (RPO) range. This type of failover applies to scenarios where a fault cannot be rectified within a short period of time at the production site and business must be immediately switched to the disaster recovery site.

Billing

If you use the async replication feature for disaster recovery, the following fees are incurred:

  • The usage fees of disaster recovery software are included in Cloud Backup bills.

    Async replication is in public preview. You can use disaster recovery software for free during the public review.

  • The usage fees of the pay-as-you-go ECS instances and disks created at the disaster recovery side are included in ECS bills. For more information, see Pay-as-you-go.

  • The fees incurred by the async replication feature are included in ECS bills. For more information, see Overview.