All Products
Search
Document Center

Cloud Backup:Cross-region disaster recovery

Last Updated:Jan 12, 2026

Cross-region disaster recovery (DR) ensures business continuity by failing over applications to a secondary region during a major outage (such as earthquake, tsunami). This approach protects against regional disasters and delivers a highly reliable service, with a target recovery point objective (RPO) as low as 1 minute and a typical recovery time objective (RTO) of 15 minutes.

Before you begin

Before implementation, you must prepare a DR environment in a separate region from your production site. This environment requires a dedicated Virtual Private Cloud (VPC), and within this VPC, two distinct vSwitches must be created: one for the replication network and another for the recovery network.

Step 1: Create a DR site pair

To create a site pair for cross-region disaster recovery of the production ECS instances, follow these steps:

  1. Log on to the Cloud Backup console.

  2. In the left-side navigation pane, choose Disaster Recovery > ECS Disaster Recovery.

  3. Click Switch to CDR.

  4. Click Add.

  5. In the Create Site Pair panel, configure the parameters and then click Create.

    1. Set Type to Cross-region Disaster Recovery.

    2. Configure the production site information.

      The production site defines the location of the servers to be protected.

      Parameter

      Description

      Name

      Enter a name for the production site. The name must meet the following requirements:

      • Up to 60 characters in length.

      • Not start with a special character or digit.

      • Contain only the following special characters: periods (.), underscores (_), and hyphens (-).

      Region

      Select the region where the production site resides from the Region drop-down list.

      VPC

      Select the VPC that is created for the production site from the VPC drop-down list.

    3. Configure the DR site information.

      Resources for the DR site are provisioned within the specified VPC.

      Parameter

      Description

      Name

      Enter a name for the DR site. The name can be up to 60 characters in length. The name must meet the following requirements:

      • The name cannot start with a special character or digit.

      • The name can contain only the following special characters: periods (.), underscores (_), and hyphens (-).

      Region

      Select the region where the DR site resides from the Region drop-down list.

      VPC

      Select the VPC where the DR site resides from the VPC drop-down list.

Step 2: Add the servers to be protected

After the DR site pair is created, perform the following steps to add the servers (ECS instances) to be protected:

  1. Click the Protected Server tab. In the upper-right corner of this tab, select the DR site pair that you created in Step 1 from the drop-down list.

  2. On the Protected Server tab, click + Add. Select the ECS instances and click OK.

    Up to 10 ECS instances can be selected.

    In the Server Status column, the status of the added ECS instances is Agent Installing and then changes to Initialized. If the status of an ECS instance is not Initialized, choose More > Server Operation > Restart Server in the Operation column to initialize the instance.

Step 3: Start replication

To enable real-time replication of ECS instances to Alibaba Cloud, perform the following steps:

  1. On the Protected Server tab, find the ECS instance that you want to replicate and choose More > Failover > Start Replication in the Operation column.

  2. In the Enable Replication panel, configure the parameters and click Start.

    Parameter

    Description

    Recovery Point Policy

    Select the interval at which recovery points are created from the drop-down list. Unit: hours. For example, if you select 1 hour, Cloud Backup creates a recovery point every hour.

    Hard Disk Type

    Select Ultra Disk, ESSD, or SSD.

    Replication Network

    Select the vSwitch used to replicate data from the production site to the DR site.

    The drop-down list is automatically populated with available vSwitches from the DR site's VPC. For optimal performance and a lower RTO, it is highly recommended to use the same vSwitch for both the replication and recovery networks, as this ensures they reside in the same zone.

    Recovery Network

    Select the vSwitch used to create and run ECS instances at the DR site during a failover or drill.

    The drop-down list is automatically populated with available vSwitches from the DR site's VPC. For optimal performance and a lower RTO, it is highly recommended to use the same vSwitch for both the replication and recovery networks, as this ensures they reside in the same zone.

    Automatic restart after replication interruption

    Specify whether to automatically resume replication if an interruption occurs. If you select this check box, the replication task is restarted after the replication is interrupted.

    The ECS instance then enters the Enabling Replication, Replicating Full Data, and Replicating states in sequence.

    1. Enabling Replication: ECS disaster recovery is scanning data on the ECS instance and evaluating the overall data volume. In most cases, this process takes a few minutes.

    2. Replicating Full Data: ECS disaster recovery is replicating valid data from the ECS instance to Alibaba Cloud. The replication duration depends on factors such as the data volume and the network bandwidth of the ECS instance. The progress bar in the Server Status column shows the replication progress.

    3. Replicating: After all valid data on the ECS instance is replicated to Alibaba Cloud, Aliyun Replication Service (AReS) monitors all disk write operations on the ECS instance and replicates the incremental data to Alibaba Cloud in real time.

(Optional) Perform a DR drill

Once an ECS instance reaches the Replicating state, it is ready for a DR drill.

A DR drill is an important part of disaster recovery. A drill runs a protected ECS instance on the cloud, allowing verification that applications can run as expected. A DR drill has the following benefits:

  • Verify application viability: Confirms that applications can run correctly on a restored instance.

  • Improve team readiness: Familiarizes your team with the failover process, ensuring a smooth and rapid response during a real disaster.

To perform a DR drill, perform the following steps:

  1. On the Protected Server tab, find the ECS instance on which you want to perform a DR drill and click Test Failover in the Operation column.

  2. In the Test Failover panel, configure the following parameters: Recovery Network, IP Address, Use ECS Specification, Hard Disk Type, Recovery Point, Elastic Public Network IP, and Post Script. Then, click Start.

    Note
    • Cloud Backup automatically retains 24 recovery points that are created in the most recent 24 hours for each ECS instance.

    • If you do not select Use ECS Specification, you must set the CPU and Memory parameters.

    Alibaba Cloud then runs the application on a restored ECS instance at the specified time. The DR drill does not affect real-time data replication.

    After the DR drill is completed within a few minutes, click the link in the Test Failover Information column to verify restored data and applications.

  3. Clear the drill environment.

    After the verification is completed, click Cleanup Test Environment in the Operation column. Then, the restored ECS instance is deleted.

    Note

    After the restored ECS instance is verified, we recommend that you delete the restored ECS instance at the earliest opportunity to reduce costs.

Step 4: Perform a failover

Regular DR drills verify that applications can be successfully run on restored ECS instances, ensuring that workloads can be reliably failed over to the DR site in the event of a critical error.

Warning

A failover should only be initiated for servers that have experienced a critical error. Be aware that this action stops all real-time data replication. To re-establish protection, replication must be manually restarted, which will trigger a new full data replication before continuous protection resumes.

To initiate a failover, perform the following steps:

  1. On the Protected Server tab, find the ECS instance and choose More > Failover > Failover in the Operation column.

  2. In the Failover panel, configure the following parameters: Recovery Network, IP Address, Use ECS Specification, Hard Disk Type, Recovery Point, Elastic Public Network IP, and Post Script. Then, click Start.

    Important

    Recovery to the latest available state is a one-time operation.

  3. After the failover is completed, click the link in the Recovered Instance ID/Name column to verify the restored data and applications.

    • If the applications run as expected after failover, choose More > Failover > Commit Failover in the Operation column.

      Note

      After verifying that the restored applications are running correctly, you can commit the failover. This action finalizes the recovery process and releases the temporary cloud resources that were used, which helps to save costs.

    • If the applications do not meet the requirements after the failover (for example, if the selected recovery point contains inconsistent data or the database is corrupted), choose More > Failover > Change Recovery Point in the Operation column to change the recovery point before you commit the failover.

    Note

    The procedure for changing the recovery point is similar to that for failover, except that you must select a recovery point earlier than the current point in time.

Step 5: Perform a failback

Failback is the process of returning operations from the DR site back to the original production site after a disaster is resolved. This process involves a reverse replication to synchronize any data changes that occurred at the DR site before the final cutover.

To perform a failback, perform the following steps:

  1. On the Protected Server tab, find the ECS instance and choose More > Failback > Reversed Register in the Operation column. This action prepares the original production server to receive data from the DR site. Confirm the operation in the message that appears.

  2. In the Operation column, choose More > Failback > Initiate Reverse Replication.

  3. In the Initiate Reverse Replication panel, configure the following parameters: Original machine recovery, Replication Network, and Recovery Network. Then, click Start.

    Warning

    Failing back to the original ECS instance is an irreversible action that permanently overwrites all of its data. This option is available for both cross-region and cross-zone DR. Proceed with extreme caution.

  4. Once the server status changes to Reverse Replicating Data in Real Time, it indicates that the data synchronization is complete and you can proceed with the final cutover. In the Operation column, choose More > Failback > Failback.

  5. In the Failback panel, configure the following parameters: CPU, Memory, Recovery Network, IP, and Execute script after recovery. Then, click Start.

  6. After the failback is completed, the business is now running on the original production site. To re-establish disaster recovery protection, choose More > Failover > Registration in the Operation column. This final step reactivates the original replication plan, returning the system to its normal, protected state.