Replicate data across clusters with CCR - - Alibaba Cloud Documentation Center

Cross-cluster replication (CCR) replicates index data from a leader cluster to one or more follower clusters in near real time. This enables remote disaster recovery, read/write splitting, and geographically distributed access. This topic compares common disaster recovery solutions for Elasticsearch and explains how CCR works to help you choose the right solution.

Solution comparison

Elasticsearch (ES) supports the following remote disaster recovery solutions:

OSS snapshot: Back up index data to Alibaba Cloud Object Storage Service (OSS) for persistent storage. The first snapshot is a full backup, and subsequent snapshots are incremental. You can restore snapshot data to a destination ES instance by using a cross-cluster OSS repository. For more information, see Set up a cross-cluster OSS repository.
Logstash: Configure a pipeline to read data from a source ES cluster, process it, and write it to a destination cluster. This solution is suitable for data migration across major versions and for scenarios that require data filtering and transformation. For more information, see Quick start.
Reindex: Use the built-in Reindex API to copy all data or data that meets specific conditions from one index to another. This solution supports cross-cluster operations and is suitable for one-time migrations of small datasets. For more information, see Migrate data by using the Reindex API.
Cross-cluster replication (CCR): Asynchronously and incrementally replicates writable indices from a leader cluster to one or more follower clusters. This solution supports near real-time synchronization and is ideal for disaster recovery scenarios that require a low Recovery Point Objective (RPO) and Recovery Time Objective (RTO).

Solution comparison

Solution	Use cases	RPO	RTO	Key limitations
OSS snapshot	Periodic backup and restore for large-scale data (GB to PB).	Hours to days (depends on the snapshot interval).	Several hours (depends on data volume and shard recovery time).	Does not support continuous synchronization. Service downtime may be required during recovery.
Logstash	Data migration with low real-time requirements; scenarios requiring data filtering or transformation; migration across major versions.	Seconds to minutes (depends on synchronization frequency).	Several hours (depends on data volume and instance performance).	Batch synchronization only, not real time. Does not support synchronizing delete operations.
Reindex	One-time index migration for small datasets.	Not applicable (one-time operation).	Minutes to several hours (depends on data volume).	Does not support continuous synchronization. Inefficient for large-scale data migration.
CCR	Remote disaster recovery, read/write splitting, and geographically distributed access.	Near zero (seconds).	Seconds to minutes.	Follower indices are read-only. Requires identical mappings and shard counts.

For remote disaster recovery scenarios that require a low RPO and low latency, CCR is the best choice:

CCR synchronizes data in seconds, minimizing data loss.
If the leader cluster fails, you can redirect traffic to the follower cluster to restore service without waiting for a snapshot to be restored.
Although the initial deployment cost is higher, CCR is more cost-effective in the long run by preventing business losses caused by data loss.

How it works

Architecture

CCR uses an active-passive architecture. The leader cluster receives all write operations, while the follower cluster is read-only and replicates data from the leader cluster.

Leader cluster: The source cluster that handles all write operations.
Follower cluster: The destination cluster that is read-only and synchronizes data from the leader cluster by using CCR.

Data replication process

The data replication process in CCR consists of two phases:

Initialization

The follower cluster sends an initialization request to the leader cluster. The leader cluster then transfers all Lucene segment files from the leader index to the follower index, similar to a snapshot restore mechanism.

Incremental synchronization

Each shard in the follower index periodically sends a pull request to the leader cluster to fetch the latest operations since the last synchronization point. By default, this occurs every second. The process is as follows:

Locate the pull starting point: The follower cluster maintains a local remote_checkpoint, which indicates the latest operation successfully applied locally. This checkpoint corresponds to the global_checkpoint in the transaction log (translog) of the leader cluster.
Read operations from the leader translog: The leader cluster uses the from_seq_no provided by the follower cluster to find the starting position in its translog. It then reads all subsequent operations (index, update, delete) and returns them as a list.
Replay operations on the follower cluster: The follower cluster replays these operations locally in order and updates its remote_checkpoint. If a replay fails, for example, due to a version conflict, synchronization is paused, and an error is logged.
Continuous polling: The follower cluster continuously polls for new operations at a fixed interval, typically achieving sub-second latency.

Role of the translog

The transaction log (translog) is the data source for incremental synchronization in CCR. It serves the following purposes in Elasticsearch:

Prevent data loss: By default, Elasticsearch performs a refresh every second, which creates a new searchable Lucene segment from the in-memory buffer. However, this segment is not yet flushed to disk. The translog records all write operations. If a node crashes, you can recover data by replaying the log.
Ensure replica consistency: Elasticsearch first writes operations to the translog and then forwards them to replica shards. It returns a success response only after both the primary and replica shards confirm the write.
Support incremental synchronization in CCR: CCR uses an internal Elasticsearch API to read the translog and fetch all changes after a specified sequence number, which enables near real-time data replication.

The translog is stored separately for each shard. Each shard has its own translog directory located at indices/{index_uuid}/{shard_id}/translog/. Translog files (.tlog) are stored in a binary format and managed using a generation mechanism. Elasticsearch creates a new generation file each time a flush occurs or when the file reaches its size limit, which is 512 MB by default.

CCR networking for Alibaba Cloud ES

Alibaba Cloud Elasticsearch instances are deployed in a dedicated management VPC, not in your customer VPC. Even if two clusters are in the same region or their VPCs are connected across regions by using CEN, they cannot communicate directly over a private network. You must use NLB and PrivateLink to connect the management VPCs.

Choose the appropriate documentation based on whether the two clusters are in the same region:

Scenario	Description	Documentation
Same region	The leader and follower clusters are in the same region. You connect their management VPCs by using NLB and PrivateLink.	Replicate data between ES clusters in the same region
Across regions	The leader and follower clusters are in different regions. You must first connect their customer VPCs by using CEN, and then connect their management VPCs by using NLB and PrivateLink.	Replicate data between ES clusters across regions

Limitations

The management deployment mode of both clusters must be cloud-native new management (v3). If a cluster has a v1 or v2 architecture, you must upgrade its architecture first. For more information, see Upgrade instance architecture.
To check the architecture version of an Elasticsearch cluster, log on to the Elasticsearch console. On the Basic Information page of the instance, view the Control Architecture Type. The mode can be Cloud-native Control Architecture (v3) or Basic Control Architecture (v2).
Both clusters must run Elasticsearch 7.10.0 or later. The version of the follower cluster must be the same as or later than the version of the leader cluster.
Follower indices are read-only and do not support write operations. To make the target index writable, perform the following operations:
1. Pause the follow task: POST /<index>/_ccr/pause_follow
2. Close the index: POST /<index>/_close
3. Unfollow the index: POST /<index>/_ccr/unfollow
4. Reopen the index to make it read/write: POST /<index>/_open
The leader and follower indices must have identical mappings and shard counts. You cannot change the number of shards in the follower index.