EBS async replication and CDR for ECS disaster recovery FAQ - Cloud Backup

What instance specifications are supported by ECS disaster recovery (EBS async replication)? Are limits imposed on disks? Are limits imposed on IP addresses?

ECS disaster recovery (EBS async replication) has limits on regions, zones, disk types, elastic network interfaces (ENIs) of ECS instances, and configuration quotas.

Most instance specifications are supported.
Only enhanced SSDs (ESSDs) are supported, excluding ESSD Entry disks and ESSD AutoPL disks. For more information, see Limits.
ECS networks have the following limits:
- Single ENI:
  After a failover, the ENI cannot be automatically configured for some operating systems at the disaster recovery site. After the failover, check and configure the ENI at the disaster recovery site to ensure that the network works properly. For more information, see Configure a secondary ENI.
- Multiple ENIs:
  - After an ECS instance is bound to a secondary ENI, some images cannot automatically identify the IP address of the secondary ENI or add a route. As a result, the secondary ENI cannot work properly.
  - If an ECS instance is configured with a secondary ENI, check the IP address of the secondary ENI after a failover. This ensures that the secondary ENI works as expected. For more information, see Configure a secondary ENI.
- Only ENIs and ECS instances that reside in the same virtual private clouds (VPCs) as the disaster recovery site pair are supported.

Where can I change the IP address of the disaster recovery site for ECS disaster recovery (EBS async replication)?

On the Network Information tab of the instance details page, you can manually specify an IP address for the disaster recovery site.
In the Preview Basic Information panel, you can manually specify an IP address for the disaster recovery site.

Does EBS async replication for ECS disaster recovery support configuration modifications at the disaster recovery site?

If a protection group is being initialized and the number and capacity of disks in the protection group do not exceed the limits, configuration modifications such as disk adding can be automatically synchronized to the disaster recovery site.
If the protection group is in a state such as in replication or failover, the configuration modifications at the production site or the disaster recovery site may affect the failover and failback for disaster recovery. In this case, the configuration modifications are not supported. However, the resources of the protection group are checked at both sites. If an exception occurs, an alert is generated. Evaluate your business requirements and proceed with caution.

What do I do if I cannot select an instance when I add instances for ECS disaster recovery (EBS async replication)?

ECS disaster recovery (EBS async replication) has limits on regions, zones, disk types, networks, and configuration quotas. You can troubleshoot the exception based on the specific cause as prompted in the console. For more information, see Limits.

What do I do if the instance type of the disaster recovery site is abnormal when I enable ECS disaster recovery (EBS async replication)?

This issue occurs because the instance types of the protected instance are unavailable or insufficient at the disaster recovery site. We recommend that you perform the Change Instance Type operation in the console to change the instance type as required. If an exception occurs on the operating system or IP address, you can perform the Modify Operating System or Modify Disaster Recovery IP operation to change the operating system or IP address as required.

What do I do if a failover for ECS disaster recovery (EBS async replication) failed?

The protection group is in the Failover Failed state.

In the console, click the ID of the failed task as prompted. On the Tasks tab, view the detailed error cause.

For example, Not have any stock of instance type family ... indicates that the instance type family does not exist. In this case, perform the Change Instance Type at DR Site operation on the Protected Instances page, and then retry the task.

What are the differences between CDR and EBS async replication for ECS disaster recovery?

EBS async replication is a feature that protects data across regions or across zones within the same region based on the data replication capability of EBS. For more information, see Overview.

The following table describes the differences between CDR and EBS async replication.

Item	CDR	EBS async replication
Application scenarios	Disaster recovery for a single virtual machine (VM). If you do not mind intrusions into the system, you can use this replication technology.	Disaster recovery that ensures the consistency of VM groups. If you do not expect intrusions into the system, you can use this replication technology.
Intrusive to the system	Yes	No
Replication implementation	An agent is installed on the operating system of the protected instance, so that Cloud Backup replicates data written into the disks and sends the data to a gateway in real time. The gateway stores the data in an Object Storage Service (OSS) bucket and then writes the data to the disk at the disaster recovery site.	Data is replicated by using the EBS async replication and snapshot features.
Recovery implementation	Supports multiple recovery points. A shadow ECS instance and a gateway server are created for the protected ECS instance at the disaster recovery site. Cloud Backup reads data from the OSS bucket to the shadow ECS instance, writes the data to the ECS instance at the disaster recovery site, and then creates a recovery point based on the snapshot mechanism.	Supports only a single recovery point. Cloud Backup creates a recovery point by replicating the snapshot to the disaster recovery site.
Consistency group	Not supported	Supported

What are the RPO and RTO of ECS disaster recovery (CDR)?

Core business data is replicated from the self-managed data centers of enterprises to the cloud in real time, achieving recovery point objectives (RPOs) in seconds to minutes. If a major failure occurs on a self-managed data center, data is replicated from the self-managed data center to the cloud within a few minutes, achieving recovery time objectives (RTOs) in minutes.

Which operating systems are supported by ECS disaster recovery (CDR)?

CDR supports mainstream Windows and Linux operating systems. For more information, see Operating systems.

The following table describes the operating systems that support ECS disaster recovery.

Operating system	Version
Windows Server	2008 R2, 2012, 2012 R2, and 2016
Linux	Important You must make sure that the /boot partition and the / partition reside on the same disk. If the partitions do not reside on the same disk, move the partitions to the same disk, and then register the ECS instance for which you want to enable ECS disaster recovery. Red Hat Enterprise Linux 7.0 to 7.9 Red Hat Enterprise Linux 6.0 to 6.10 CentOS 7.0 to 7.9 CentOS 6.0 to 6.10 Note Only 64-bit CentOS is supported. SUSE Linux Enterprise Server 12.0 to 12.3 Important Only 64-bit SUSE Linux Enterprise Server is supported. If SUSE Linux Enterprise Server 12.1 runs on a VMware virtual machine (VM), a black screen appears after you restart the VM. The black screen is caused by operating system errors, but not by ECS disaster recovery. Alibaba Cloud Linux 2.1903 LTS 64-bit The following kernel versions of Alibaba Cloud Linux 2.1903 LTS 64-bit are supported: 4.19.91-25.1.al7.x86_64 4.19.91-24.1.al7.x86_64 4.19.91-23.al7.x86_64 4.19.91-22.2.al7.x86_64

What are the snapshot retention policies for ECS disaster recovery (CDR)?

The recovery points for ECS disaster recovery use the snapshot feature of shadow disks to ensure that the servers protected by disaster recovery can be restored to a specified historical version.

The following items show the snapshot retention policies:

Note

If a recovery point has been used for disaster recovery drills or failover, it is not restricted by these snapshot retention policies.

All recovery points of the last day are retained.
For example, the current UTC time is 2020-10-12T17:00:00Z, and the duration of the last day is from 2020-10-11T00:00:00Z to 2020-10-12T17:00:00Z, containing a total of 41 hours.
The last recovery point of each day in the last week is retained.
The last recovery point of each week in the last month is retained.
All recovery points are retained for a month. Expired recovery points are cleared.

Does ECS disaster recovery (CDR) support scale-up of or disk adding on a source ECS instance?

Only a source Linux ECS instance in the site pairs for cross-region and cross-zone cloud disaster recovery support scale-up or disk adding.

After the source ECS instance is scaled up or disks are added, ECS disaster recovery can detect disk changes within 5 minutes. ECS disaster recovery stops the ongoing server replication, adjusts the capacity of the destination shadow disks, repairs the replication, and then resumes real-time replication. This process depends on the disk size and may last for a long period of time. You can observe the status change from Repair Replication to Replicating in the console. The process is performed automatically.

Important

ECS disaster recovery does not support scale-in or disk reduction on the source ECS instance because the operation may lead to replication errors or data loss.