Elasticsearch data disaster recovery | OSS snapshot vs CCR - Elasticsearch

Solution selection

ES offers two solutions for off-site disaster recovery:

OSS snapshot backup and restore: Backs up index data to Alibaba Cloud Object Storage Service (OSS). The first snapshot is a full backup; subsequent snapshots are incremental. Restore data to another ES instance through a cross-cluster OSS repository. Back up and restore data by using a cross-cluster OSS repository.
Cross-cluster replication (CCR): Replicates writable indexes from a leader cluster to one or more follower clusters asynchronously and incrementally, with near real-time sync. Ideal for disaster recovery with strict RPO and RTO requirements. Replicate data across clusters by using CCR.

Solution	Scenarios	RPO	RTO	Major limitations
OSS snapshot	Periodic backup and restore of large-scale data (GB to PB).	Hours to days (depends on snapshot interval).	Several hours (depends on data volume and shard recovery).	No continuous sync. Downtime may be required during restoration.
CCR	Off-site disaster recovery, read/write splitting, and proximity-based access.	Near-zero (seconds).	Seconds to minutes.	Follower indexes are read-only. Requires identical mappings and shard counts.

For off-site disaster recovery with low RPO and real-time availability requirements, CCR is optimal:

Synchronizes data in seconds, minimizing data loss.
If the primary cluster fails, switch traffic to the follower cluster immediately without waiting for snapshot restoration.
Higher initial cost, but more cost-effective long-term by preventing business losses from data unavailability.