This topic describes how to use the cross-cluster replication (CCR) feature to migrate data between a local Alibaba Cloud Elasticsearch cluster and a remote Alibaba Cloud Elasticsearch cluster.

Precautions

The adjustment made to the Alibaba Cloud Elasticsearch network architecture has the following impacts on clusters:

  • Clusters created after October 2020 do not support the X-Pack Watcher and Lightweight Directory Access Protocol (LDAP) authentication features.
  • You cannot reindex, search for, and replicate data between a cluster created before October 2020 and a cluster created after October 2020. If you want to perform these operations between them, make sure that the clusters are under the same network architecture.
Note The network architecture in the China (Zhangjiakou) region and the regions outside China was adjusted before October 2020. If you want to perform the preceding operations between a cluster created before October 2020 and that created after October 2020 in such a region, submit a ticket to contact technical support personnel to check whether the network architecture supports the operations.

Background information

CCR is a commercial feature released in open source Elasticsearch Platinum. After you purchase an Alibaba Cloud Elasticsearch cluster, you can use this feature free of charge based on a few simple configurations. Only single-zone Elasticsearch clusters of V6.7.0 or later support this feature. CCR is used in the following scenarios:
  • Disaster recovery and high availability

    You can use CCR to back up data among Elasticsearch clusters that reside in different regions. If a cluster fails, you can retrieve its index data from other clusters. This prevents data loss.

  • Data access from a nearby cluster

    For example, Company A has multiple subsidiaries that are located in different regions. To speed up business processing, you can plan the business of the subsidiaries based on their geographical locations. Then, use CCR to distribute business data to Elasticsearch clusters in different regions. Each subsidiary can directly use the cluster in the region where the subsidiary is located to process business.

  • Centralized reporting

    You can use CCR to replicate data from multiple small clusters to one cluster. Then, you can perform visualized analytics and reporting for the data in a centralized manner.

To use CCR, you must prepare two types of clusters: local clusters and remote clusters. Remote clusters provide source data, which is stored in leader indexes. Local clusters replicate the data and store it in follower indexes. You can also use CCR to migrate large volumes of data at a time in real time. For more information, see Cross-cluster replication.

Procedure

  1. Preparations
    Prepare a local cluster, a remote cluster, and a leader index.
  2. Step 1: Connect clusters
    Connect the remote cluster to the local cluster.
  3. Step 2: Add the remote cluster
    In the Kibana console of the local cluster, add the remote cluster.
  4. Step 3: Configure CCR
    In the Kibana console of the local cluster, configure the leader index and a follower index.
  5. Step 4: View migration results
    Insert data into the remote cluster. Then, verify the data migration on the local cluster.

Preparations

  1. Create a local cluster and a remote cluster.
    For more information, see Create an Alibaba Cloud Elasticsearch cluster. The two clusters must be single-zone clusters, reside in the same virtual private cloud (VPC) and vSwitch, and be of the same version (V6.7.0 or later).
  2. Log on to the Kibana console of the remote cluster and create a leader index.
    Notice
    • If you create an index in an Elasticsearch cluster of V7.0 or earlier, you must enable the soft_deletes attribute. Otherwise, an error is reported.
    • If you want to migrate data in an existing index, you can call the reindex API to enable the soft_deletes attribute.
    PUT myindex
    {
      "settings": {
        "index.soft_deletes.retention.operations": 1024,
        "index.soft_deletes.enabled": true
      }
    }
  3. Disable the physical replication feature for the leader index.
    The physical replication feature is automatically enabled for indexes in Elasticsearch V6.7.0 clusters. Before you use CCR, you must disable the physical replication feature.
    1. Disable the index.
      POST myindex/_close
    2. Update the settings configuration of the index to disable the physical replication feature.
      PUT myindex/_settings
      {
      "index.replication.type" : null
      }
    3. Enable the index.
      POST myindex/_open

Step 1: Connect clusters

Configure the remote cluster to connect it to the local cluster. For more information, see Connect Elasticsearch clusters. If the two clusters are connected, the information shown in the following figure appears. Connect clusters

Step 2: Add the remote cluster

  1. Log on to the Kibana console of the local cluster.
    For more information, see Log on to the Kibana console.
  2. In the left-side navigation pane, click Management.
  3. In the Elasticsearch section, click Remote Clusters.
  4. Click Add a remote cluster.
  5. In the Add remote cluster section, configure the following parameters.
    Add remote cluster
    • Name: the name of the remote cluster. The name must be unique.
    • Seed nodes: the nodes in the remote cluster. Specify each node in the format of Node IP address:9300. To obtain the IP addresses of nodes, log on to the Kibana console of the remote cluster and run the GET /_cat/nodes?v command on the Console tab of the Dev Tools page. The nodes you specify must include a dedicated master node of the remote cluster. We recommend that you specify multiple nodes. This ensures that you can still use CCR when the specified dedicated master node fails. Obtain the IP address of a node
      Notice During CCR, Kibana uses the IP addresses of data nodes to access clusters over TCP port 9300. HTTP port 9200 is not supported.
  6. Click Save.
    The system then automatically connects to the remote cluster. If the connection is established, Connected appears. Connected

Step 3: Configure CCR

  1. Log on to the Kibana console of the local cluster. In the left-side navigation pane, click Management. In the Elasticsearch section of the page that appears, click Cross Cluster Replication.
  2. On the Follower indices tab, click Create a follower index.
  3. In the Add follower index section, configure the following parameters.
    Configure CCR
    Parameter Description
    Remote cluster Select the cluster you added in Step 2: Add the remote cluster.
    Leader index The index whose data you want to migrate. In this example, the myindex index that is created in Preparations is used.
    Follower index The index to which data is migrated. You must specify a unique index name.
  4. Click Create.
    After the follower index is created, the index is in the Active state. Index status

Step 4: View migration results

  1. Log on to the Kibana console of the remote cluster and insert data into the remote cluster.
    POST myindex/_doc/
    {
      "name":"Jack",
      "age":40
    }
  2. Run the following command in the Kibana console of the local cluster to check whether the inserted data is migrated to the local cluster:
    GET myindex_follow/_search
    If the command is successfully run, the result shown in the following figure is returned. Data migration results
    The preceding figure shows that data in the leader index myindex of the remote cluster is migrated to the follower index myindex_follow of the local cluster.
    Notice The follower index myindex_follow is read-only. If you want to write data to the follower index, convert the follower index into a common index first. For more information, see Use Elasticsearch CCR to migrate data across data centers.
  3. Insert a data record into the remote cluster and check whether the data record is migrated to the local cluster in real time.
    POST myindex/_doc/
    {
      "name":"Pony",
      "age":50
    }
    Query the inserted data record in the local cluster. The following figure shows the data record. Verify real-time data migration
    The preceding figure shows that the CCR feature can implement real-time migration of incremental data.
    Note You can also call the APIs for the CCR feature to perform cross-cluster replication operations. For more information, see Cross-cluster replication APIs.

FAQ

Q: I can use port 9300 to add a remote cluster. Why is only port 9200 accessible when I use a domain name to access an Elasticsearch cluster?

A: Port 9300 is an open port. However, when you access a cluster over the Internet, Server Load Balancer (SLB) enables only port 9200 during port verification for security purposes. This will be adjusted in the future.