ESSD cloud disks use triplicate storage and end-to-end data validation to ensure data durability and integrity. ESSD cloud disks that use local redundancy provide 99.9999999% (nine nines) data reliability, and ESSD cloud disks that use zone redundancy provide 99.9999999999% (twelve nines) data reliability.
Technical advantages
Data durability: The system automatically saves each piece of data as three replicas distributed across different physical nodes and racks. If one or two replicas become unavailable, data can still be read from and written to the remaining replicas.
Data integrity: At each stage of the data write and storage process, the system generates a checksum, which is similar to a data fingerprint, by using a validation algorithm and verifies it at each subsequent step. If a mismatch is detected between data and its checksum, error correction is immediately triggered to prevent data corruption during transfer and storage. This validation is hardware-accelerated and has a negligible impact on read and write performance.
Automatic fault recovery: When the system detects a storage node failure or an insufficient number of replicas, it automatically restores data from a healthy replica to re-establish the full three-replica state. The entire recovery process is transparent to your applications.
Protection scenarios
Data unavailability due to hardware failure
Challenge: Unpredictable failures such as disk damage, server downtime, or rack power outages can make data on the affected physical devices inaccessible.
Technical protection: The triplicate storage mechanism distributes data across different physical nodes. In the event of a failure, the system automatically fails over to healthy replicas to maintain service continuity while it rebuilds a new replica in the background, with no impact on your business.
Silent data corruption
Challenge: Unnoticed errors can occur during data transfer or storage due to factors like memory bit flips, network transmission errors, or disk firmware and media degradation. Because these errors are difficult to detect with traditional methods, they can cause data inconsistencies and pose a serious threat to data integrity.
Technical protection: End-to-end data validation generates checksums at each step of the data write process. When you read data, the system verifies these checksums at each stage. If it finds a mismatch, it immediately triggers error correction to ensure that the data you read is identical to the data that was written.
These reliability technologies protect against hardware failures and data corruption at the infrastructure layer. Application-level risks, such as accidental deletion or virus attacks, require protection through snapshots.
Triplicate storage mechanism
The triplicate storage mechanism is designed to address data unavailability caused by hardware failures. The system automatically replicates each piece of data written to a cloud disk into three copies at the underlying layer and stores them on different physical nodes.
Data write process

The system uses a multi-replica synchronous write mechanism. A write operation succeeds only when the data is written to all replicas. Otherwise, the operation fails. This mechanism ensures strong consistency, which means that any subsequent read request can access the most recently written data.
Replica placement strategy
To prevent correlated failures, such as multiple replicas becoming unavailable due to a rack power outage, the triplicate storage mechanism automatically follows this placement strategy:
Rack isolation: The system distributes the three replicas across storage nodes on different racks. A failure of a single machine or a single rack does not affect data availability.
Fault domain isolation: The system distributes the three replicas of an ESSD cloud disk that uses local redundancy across different racks within the same availability zone. The system distributes the replicas of an ESSD cloud disk that uses zone redundancy across different availability zones. This upgrades the disaster recovery capability from rack-level to availability zone-level.
Load balancing: While satisfying the isolation requirements, the system also considers storage capacity, I/O load, and network topology to achieve balanced resource utilization and optimal performance.
Fault recovery process

The system features an automated data self-healing capability. When the system detects an insufficient number of replicas, it automatically triggers a recovery process. The system selects a new, healthy storage node that meets the isolation policy and copies data from an existing replica to quickly restore the full three-replica state. This entire process is transparent to your applications and requires no manual intervention.
End-to-end data validation
End-to-end data validation is designed to resolve the issue of silent data corruption during data transfer and storage.
Validation process
End-to-end means that at each stage of the write and storage process, the system uses a Cyclic Redundancy Check (CRC) code to verify data integrity.
After an I/O request is initiated: Data enters the block storage path, and an initial checksum is generated.
After memory copy: After data is copied to the memory of the compute node, the system compares the checksum to detect data errors.
After network transmission: When the data reaches the network layer of the storage node, the system compares the checksum to detect bit errors during transfer.
Upon receipt by the storage node: After the data is written to the memory of the storage node, the system compares the checksum.
When data is persisted to disk: After the data is written to the disk, the system compares the checksum.
If the system finds a checksum mismatch at any stage, it immediately triggers error handling. This validation is hardware-accelerated and has a negligible impact on read and write performance.
Error handling
The system handles errors differently based on where the error occurs:
Network transport layer: The system automatically retransmits the data until the validation passes.
Storage media: The system marks the bad block and reads correct data from other replicas for recovery.
Memory: The Error-Correcting Code (ECC) mechanism automatically corrects errors, and the system retries the I/O operation.
FAQ
Does the triplicate storage mechanism mean I have to pay for three times the storage capacity?
No. Triplicate storage is a built-in data reliability feature provided by Alibaba Cloud. Alibaba Cloud bears the cost of the underlying 3x storage redundancy. You pay only for the cloud disk capacity that you purchase. For example, if you purchase a 40 GiB cloud disk, both the usable space and the billable capacity are 40 GiB.
How can I further protect my data?
Create an automatic snapshot policy to perform regular backups. You can use snapshots to roll back a cloud disk if an issue occurs.
Copy snapshots across regions. If a failure occurs, you can use a snapshot to create a data disk and attach it to a standby instance.
Can the triplicate storage mechanism prevent all types of data loss?
The triplicate storage mechanism protects against hardware failures at the infrastructure layer. Application-level risks, such as accidental deletion or virus attacks, require protection through snapshots.
How does the triplicate storage mechanism ensure data consistency?
The system uses a multi-replica synchronous write mechanism. A write operation succeeds only when the data is written to all replicas. Otherwise, the operation fails. This mechanism ensures strong consistency, which means that any subsequent read request can access the most recently written data.