use replication pair-consistent groups to implement disaster recovery - Elastic Compute Service

After you create and activate a replication pair-consistent group, if the disks at the primary site (primary disks) fail, you can use the replication pair-consistent group to batch restore data stored on the primary disks. This topic describes how to use replication pair-consistent groups to implement disaster recovery.

Background information

Replication pair-consistent groups support the failover and reverse replication features. When the primary disks fail, you can use the failover feature to enable read and write permissions on data stored on the disks at the secondary site (secondary disks) and replicate the data from the primary disks to the secondary disks. Then, you can attach the secondary disks to temporary Elastic Compute Service (ECS) instances that are created for the failover to ensure service availability. After the issues of the primary disks are resolved, you can use the reverse replication feature to replicate the latest data stored on the secondary disks back to the primary disks.
If you use a replication pair-consistent group to implement failover and reverse replication, the async replication feature is enabled for all replication pairs in the replication pair-consistent group. This indicates that all the replication pairs implement failover and reverse replication at the same time.
After disaster recovery is implemented in a replication pair-consistent group, the data stored on all disks in the group is restored to the same point in time.

(Optional) Step 1: Perform a failover drill

After the async replication feature is enabled, the data on the primary disks is continuously replicated to the secondary disks by replication pair-consistent group. You can use the failover drill feature to ensure the continued replication and restore the data at the latest recovery point from the secondary disks to new disks. This helps test the completeness and correctness of applications at the secondary site. During the drill, real-time data replication is not affected. If the primary site fails, the drill continues. If the secondary site fails, the drill fails.

Log on to the Elastic Block Storage (EBS) console.
In the left-side navigation pane, choose Enterprise-level Features > Replication Pair-consistent Group.
In the upper-left corner of the top navigation bar, select a region.
On the Replication Pair-Consistent Group page, find the replication pair-consistent group on which you want to perform the drill and click the group ID.
In the Drills section, click Create Drill.
In the Create Drill dialog box, confirm the region and zone of the group and click OK.
After you create the drill, disks are created in the secondary zone. The created disks have the same quantity and specifications as those in the primary zone. The created disks contain latest-recovery-point data that can be used to test the completeness and correctness of applications.
Note
After you create the drill, you can perform the following operations based on your business requirements:
- Create multiple disaster recovery drills to back up data at different recovery points.
- Delete multiple drill pairs and drill disks with one click in the Drills section to centrally manage disks.

Step 2: Perform a failover

You can use the failover feature to enable read and write permissions on data stored on the secondary disks. We recommend that you create temporary ECS instances in the zone where the secondary disks reside based on your business requirements. If the primary disks fail, the secondary disks whose data can be read and written are attached to the temporary instances to continue to provide services until the issues of the primary disks are resolved.

Warning

The failover feature suspends the async replication feature. Make sure that you only use the failover feature when your primary disks fail to prevent data loss.

Log on to the Elastic Block Storage (EBS) console.
In the left-side navigation pane, choose Enterprise-level Features > Replication Pair-consistent Group.
In the upper-left corner of the top navigation bar, select a region.
On the Replication Pair-Consistent Group page, find the replication pair-consistent group to which the failed primary disks belong. In the Operation column, choose > Failover.
Note
Alternatively, click the ID of the replication pair-consistent group. On the group details page, click Failover in the upper-right corner of the page.
In the message that appears, read the notes and click OK.
- After the failover is performed, Failed Over is displayed in the Status column corresponding to the group.
- The failover is performed for all replication pairs in the group. At this point, you can attach the secondary disks to temporary ECS instances to continue to provide services.

Step 3: Perform a reverse replication

After the issues of the primary disks are resolved, you can use the reverse replication feature to replicate the latest data stored on the secondary disks back to the primary disks to implement disaster recovery.

Warning

After a reverse replication is performed, the original data stored on the primary disks is overwritten by the data that is replicated from the secondary disks. We recommend that you create snapshots for the primary disks to prevent data loss. For more information, see Create a snapshot for a disk.

Log on to the Elastic Block Storage (EBS) console.
In the left-side navigation pane, choose Enterprise-level Features > Replication Pair-consistent Group.
In the upper-left corner of the top navigation bar, select a region.
On the Replication Pair-Consistent Group page, find the replication pair-consistent group on which you have performed a failover. In the Operation column, choose > Reverse replication.
Note
Alternatively, click the ID of the replication pair-consistent group. On the group details page, click Reverse replication in the upper-right corner of the page.
In the Reverse replication message, read the notes and click Confirm.
After the reverse replication is performed, Stopped is displayed on the Status column corresponding to the group.
Important
After the reverse replication is performed, the relationship of the original primary and secondary sites in the replication pair is reversed. The original primary site becomes the new secondary site, and the original secondary site becomes the new primary site. For example, assume that before a reverse replication is performed, China (Heyuan) is the primary site and China (Chengdu) is the secondary site. After the reverse replication is performed, China (Chengdu) becomes the primary site and China (Heyuan) becomes the secondary site.
Find the replication pair-consistent group on which the reverse replication is performed and click Activate in the Operation column.
After this step is performed, data stored on the original secondary disks is asynchronously replicated back to the original primary disks.
If the data is replicated back to the original primary disks, Normal is displayed in the Status column corresponding to the replication pair-consistent group and disaster recovery is completed.
(Optional) Restore the relationship of the primary and secondary sites in the replication pair-consistent group to the original status.
After the reverse replication is performed, the relationship between the original primary and secondary sites is reversed. To restore the relationship, perform the following steps:
1. Find the replication pair-consistent group on which the reverse replication is performed. In the Operation column, choose > Failover to perform a failover.
2. Choose > Reverse replication in the Operation column to perform a reverse replication.
3. After the relationship of the primary and secondary sites in the replication pair-consistent group is restored to its original status, for example, the primary site is restored to China (Heyuan) and the secondary site is restored to China (Chengdu), click Activate in the Operation column to enable the async replication feature for the replication pair-consistent group.