Elastic Block Storage (EBS) provides disk disaster recovery through asynchronous replication. Data is replicated from disks at a primary site (production site) to disks at a secondary site (disaster recovery site) across zones or regions. If the primary site fails, you can fail over to the secondary site to ensure business availability and continuity.
Scenarios
Disk disaster recovery is used in two main scenarios: cross-zone disaster recovery and cross-region disaster recovery.
Cross-zone disaster recovery: Recover from zone-level failures caused by force majeure factors such as fires or blackouts that affect data centers, or device faults such as software issues or hardware damage.
Cross-region disaster recovery: Protect against regional disasters such as earthquakes or tsunamis. If the primary site fails, you can fail over to the secondary site in a different region to continue business operations.
Features
EBS offers two disaster recovery features depending on the number of disks involved.
Async replication
Async replication is designed for disaster recovery of a single disk set. This feature uses the data replication capability of EBS to asynchronously replicate data from one disk (primary disk) to another disk (secondary disk) in a different region or zone. If the primary disk fails, you can fail over to the secondary disk and perform a reverse replication to achieve disaster recovery.
Cross-zone disaster recovery
Cross-region disaster recovery
Replication pair-consistent group
A replication pair-consistent group is designed for disaster recovery of multiple disk sets. This feature lets you batch manage and operate disks in disaster recovery scenarios where a business system involves multiple disks. You can restore the data of all the disks in a replication pair-consistent group to the same point in time to implement disaster recovery.
Data is asynchronously replicated from primary disks at the primary site to secondary disks at the secondary site across regions or zones. When the primary site fails, you can fail over to the corresponding secondary site and perform a reverse replication to achieve disaster recovery.
Cross-zone disaster recovery
Cross-region disaster recovery
Limits
Specification limits
The following table describes the specification limits for async replication and replication pair-consistent groups.
Item | Limit |
Replication pairs per disk | 1 |
Replication pairs per replication pair-consistent group | 17 |
Replication cycle | 15 minutes (data is asynchronously replicated from a primary disk to a secondary disk every 15 minutes) |
Replication rate | Up to 100 MB/s. May vary based on system load. |
Primary disk category | ESSDs or ESSD AutoPL disks |
Secondary disk category | Must be the same disk category, performance level, and capacity as the corresponding primary disk |
Disk operation limits
The following table describes the disk operation limits for async replication and replication pair-consistent groups.
The numbers in the following table correspond to the following limits:
After a replication pair is activated, the secondary disk enters the read-only state and no users have write permissions on the disk.
Due to the recovery point objective (RPO), the data of a snapshot created for a primary disk may be inconsistent with that of a snapshot created at the same time for the associated secondary disk.
Replication is restricted to encrypted disks. Cross-replication between encrypted disks and unencrypted disks is not supported.
Operation | Primary disk | Secondary disk |
Read and write | Supported | Not supported (see note 1) |
Disk deletion | Not supported | Not supported |
Disk initialization | Not supported | Not supported |
Disk resizing | Not supported | Not supported |
Disk attaching | Supported | Not supported |
Snapshot creation | Supported | Supported (see note 2) |
Rollback based on snapshots | Supported | Not supported |
Disk category change | Not supported | Not supported |
Performance level change | Not supported | Not supported |
Disk encryption | Supported | Supported (see note 3) |
Multi-attach | Not supported | Not supported |
Disk migration with instances | Not supported | Not supported |
Billing
Async replication uses a pay-as-you-go billing method.
You are charged based on the total amount of data replicated.
Pay-as-you-go cost for async replication = Data volume unit price × Total size of replicated data.
Pay-as-you-go charges apply only to cross-region replication. Replication across zones within the same region is currently free of charge.
If you run disaster recovery drills, disks created at the disaster recovery site are billed on a pay-as-you-go basis. For more information, see Pay-as-you-go.
Replication Time Control (RTC) activation fees:
If RTC is enabled for a cross-region async replication pair, you incur an additional charge based on the amount of replicated data.The unit price is USD 0.0143 per GB.
RTC fees do not apply to replication across zones within the same region.
Terms
Before you implement disaster recovery for disks, familiarize yourself with the following terms.
Term | Description |
Asynchronous replication | Async replication replicates data from one disk to another disk across regions or zones on a periodic basis. Because data is not synchronized in real time, the data on the source disk is not always identical to the data on the destination disk. |
Primary site | The data center where a primary disk is located. A primary site can independently support normal business operations. After a reverse replication is performed, the primary site is converted to the secondary site. |
Secondary site | The data center where a secondary disk is located. The secondary site serves as a backup for the primary site. If the primary site fails, the secondary site takes over business operations to ensure continuity. After a reverse replication is performed, the secondary site is converted to the primary site. |
Primary disk | The disk from which data is replicated to implement disaster recovery. The primary disk is also called the source disk. After a reverse replication is performed, the primary disk is converted to the secondary disk. |
Secondary disk | The disk to which data is replicated. The secondary disk is also called the destination disk. After a reverse replication is performed, the secondary disk is converted to the primary disk. |
RPO | Recovery point objective. The amount of data that may be lost due to a disk exception, measured in time. In async replication, the default RPO is 15 minutes. This means that data written to a primary disk within the previous 15 minutes may be lost if an exception occurs. |
RTO | Recovery time objective. The time it takes a primary disk to recover after an exception occurs. For example, if the RTO is 1 hour, the data of a primary disk can be restored and the disk can resume normal operations within 1 hour of the exception. |
Async replication relationship | The replication relationship established between a primary disk, a secondary disk, and the configurations for asynchronous replication. |
Replication pair | A pair of disks that have an async replication relationship. A replication pair-consistent group can contain multiple replication pairs. |
Failover | A sub-feature of async replication that enables read and write permissions on the secondary disk and fails over to that disk. |
Reverse replication | A sub-feature of async replication that reverses the async replication relationship of a replication pair to replicate data from the original secondary disk to the original primary disk. |
Procedures
To implement disk disaster recovery, follow the steps for your scenario:
Implement disaster recovery for a single disk set
Implement disaster recovery for multiple disk sets