This topic describes how to migrate data from a self-managed Elasticsearch cluster that runs on Elastic Compute Service (ECS) instances to an Alibaba Cloud Elasticsearch cluster that is deployed in the new network architecture. You can use PrivateLink to establish a private connection to the Alibaba Cloud Elasticsearch cluster and use the reindex API to migrate data. The reindex API includes two operations: index creation and data migration.

Prerequisites

  • The self-managed Elasticsearch cluster meets the following requirements:
    • The ECS instances that host the self-managed Elasticsearch cluster are deployed in a virtual private cloud (VPC). You cannot use an ECS instance that is connected to a VPC over a ClassicLink. The self-managed Elasticsearch cluster and Alibaba Cloud Elasticsearch cluster are deployed in the same VPC.
    • The IP addresses of nodes in the Alibaba Cloud Elasticsearch cluster are added to the security groups of the ECS instances that host the self-managed Elasticsearch cluster. You can query the IP addresses of the nodes in the Kibana console of the Alibaba Cloud Elasticsearch cluster. In addition, port 9200 is enabled.
    • The self-managed Elasticsearch cluster is connected to the Alibaba Cloud Elasticsearch cluster. You can test the connectivity by running the curl -XGET http://<host>:9200 command on the server where you run scripts.
      Note You can run all scripts provided in this topic on a server that is connected to both clusters over port 9200.
    • The source index is prepared. In this example, the source index shown in the following figure is used. Source index
  • The Alibaba Cloud Elasticsearch cluster meets the following requirements:
    • The Auto Indexing feature is enabled for the cluster, or the destination index is created in the cluster.
    • Default whitelists are used.

Precautions

The network architecture of Alibaba Cloud Elasticsearch was adjusted in October 2020. Due to this adjustment, you cannot use the reindex API to migrate data between clusters in some scenarios. The following table describes such scenarios and the data migration solutions in these scenarios. Alibaba Cloud Elasticsearch clusters created before October 2020 are deployed in the original network architecture, and those created in October 2020 or later are deployed in the new network architecture.
Scenario Network architecture Support for the reindex API Solution
Use the reindex API to migrate data between Alibaba Cloud Elasticsearch clusters Both clusters are deployed in the original network architecture. Yes Use the reindex API to migrate data. For more information, see Use the reindex API to migrate data.
Both clusters are deployed in the new network architecture. No None.
One is deployed in the original network architecture, and the other is deployed in the new network architecture. No None.
Migrate data from a self-managed Elasticsearch cluster that runs on ECS instances to an Alibaba Cloud Elasticsearch cluster The Alibaba Cloud Elasticsearch cluster is deployed in the original network architecture. Yes Use the reindex API to migrate data. For more information, see Use the reindex API to migrate data from a self-managed Elasticsearch cluster to an Alibaba Cloud Elasticsearch cluster.
The Alibaba Cloud Elasticsearch cluster is deployed in the new network architecture. Yes Use the PrivateLink service to establish a network connection between the Alibaba Cloud Elasticsearch cluster and the self-managed Elasticsearch cluster that runs on ECS instances. This way, the service account of Alibaba Cloud Elasticsearch can be used to access the self-managed Elasticsearch cluster. Then, use the domain name of the endpoint you obtained and the reindex API to migrate data between the clusters. For more information, see Migrate data from a self-managed Elasticsearch cluster to an Alibaba Cloud Elasticsearch cluster deployed in the new network architecture .
Note Only some regions support PrivateLink. For more information, see Regions and zones that support PrivateLink. If the zone where your Alibaba Cloud Elasticsearch cluster resides does not support PrivateLink, you cannot use the reindex API to migrate data between the two clusters.
Note
  • Alibaba Cloud Elasticsearch clusters deployed in the new network architecture reside in an exclusive VPC for Alibaba Cloud Elasticsearch. These clusters cannot access resources in other network environments. Alibaba Cloud Elasticsearch clusters deployed in the original network architecture reside in VPCs that are created by users. These clusters can still access resources in other network environments.
  • The network architecture in the China (Zhangjiakou) region and the regions outside China was adjusted before October 2020. If you want to perform operations between a cluster that is created before October 2020 and a cluster that is created in October 2020 or later in such a region, submit a ticket to contact Alibaba Cloud technical support to check whether the network architecture supports the operations.
  • Clusters created in other regions before October 2020 are deployed in the original network architecture, and those created in other regions in October 2020 or later are deployed in the new network architecture.
  • To ensure data consistency, we recommend that you stop writing data to the self-managed Elasticsearch cluster before the migration. This way, you can continue to read data from the cluster during the migration. After the migration, you can read data from and write data to the Alibaba Cloud Elasticsearch cluster. If you do not stop writing data to the self-managed Elasticsearch cluster, we recommend that you configure loop execution for reindex operations in the code to shorten the time during which write operations are suspended. For more information, see the method used to migrate a large volume of data (without deletions and with update time) in the "Migrate data" section.
  • If you connect to the self-managed Elasticsearch cluster or the Alibaba Cloud Elasticsearch cluster by using its domain name, do not include path in the URL, such as http://host:port/path.

Procedure

  1. Step 1: Configure a CLB instance that supports PrivateLink
    Only Classic Load Balancer (CLB) instances that support PrivateLink can serve as service resources for endpoint services. Before you use PrivateLink to establish private connections to access services across VPCs, you must create a CLB instance that supports PrivateLink and configure listening settings for the CLB instance.
  2. Step 2: Create an endpoint service
    After you create an endpoint service in a VPC, you can use an endpoint that is deployed in another VPC to access the endpoint service over a private connection.
  3. Step 3: Configure a private connection to the Alibaba Cloud Elasticsearch cluster
    In the Elasticsearch console, associate the Alibaba Cloud Elasticsearch cluster with the endpoint service that is created in Step 2.
  4. Step 4: Obtain the domain name of the endpoint
    After the Alibaba Cloud Elasticsearch cluster is associated with the endpoint service, you can obtain the domain name of the associated endpoint.
  5. Step 5: Configure a remote reindex whitelist for the Alibaba Cloud Elasticsearch cluster
    In the Elasticsearch console, add the domain name that is obtained in Step 4 to the remote reindex whitelist of the Alibaba Cloud Elasticsearch cluster for authorization.
  6. Step 6: Migrate data
    After you complete the preceding steps, you can migrate data from the self-managed Elasticsearch cluster to the Alibaba Cloud Elasticsearch cluster.

Step 1: Configure a CLB instance that supports PrivateLink

  1. Create a CLB instance.
    Make sure that the CLB instance and the ECS instances that act as backend servers are deployed in the same region. For more information, see Create a CLB instance that supports PrivateLink.
    Note Only some regions support PrivateLink. For more information, see Regions and zones that support PrivateLink. If the zone where your Alibaba Cloud Elasticsearch cluster resides does not support PrivateLink, you cannot use the reindex API to migrate data between the two clusters.
  2. Configure protocol and listening settings. Set Select Listener Protocol to TCP and Listening Port to 9200.
    For more information, see Configure protocol and listening settings.
  3. Configure backend servers. Add the ECS instances that host the self-managed Elasticsearch cluster as backend servers and specify port 9200 for the ECS instances.
    For more information, see Configure backend servers.
  4. Click Next. In the Health Check step, configure the parameters based on your business requirements. In this example, the default values of the parameters are used.
  5. Click Next to go to the Confirm step. After you confirm the configuration information, click Submit.
  6. In the Configure Server Load Balancer message, click OK. The Instances page appears.

    If the health check status of an ECS instance is Normal, the ECS instance is ready to process requests.

Step 2: Create an endpoint service

  1. Log on to the VPC console.
  2. In the left-side navigation pane, click Endpoints Service.
  3. In the top navigation bar, select the region in which you want to create an endpoint service. In this example, the China (Hangzhou) region is selected.
  4. On the Endpoints Service page, click Create Endpoint Service.
  5. On the Create Endpoint Service page, configure the parameters based on your business requirements.
    Create an endpoint service
    Parameter Description
    Select Service Resource Select a zone to which you want to distribute network traffic. Then, select the CLB instance that you want to associate with the endpoint service. The zone where an endpoint service is deployed must be the same as the primary zone where the CLB instance you want to associate with the endpoint service is deployed. CLB instances serve as service resources and can be associated with endpoint services. The CLB instances that are associated with endpoint services receive requests from clients.
    CLB instances can serve as service resources only if they meet the following requirements:
    • Network Type is set to VPC.
    • Feature is set to Support PrivateLink.
    Automatically Accept Endpoint Connections Specifies whether to automatically accept connection requests from endpoints. Valid values:
    • Yes: The endpoint service accepts all connection requests from an endpoint that is associated with the endpoint service. In this case, you can use the associated endpoint to access the endpoint service. We recommend that you set this parameter to Yes.
    • No: The endpoint connection of the endpoint service is in the Disconnected state. In this case, endpoint connection requests to the endpoint service must be manually accepted or denied by the service administrator.
      • If the service administrator accepts endpoint connection requests from the associated endpoint, you can use the associated endpoint to access the endpoint service.
      • If the service administrator denies endpoint connection requests from the associated endpoint, you cannot use the associated endpoint to access the endpoint service.
    Note
    • If you set Automatically Accept Endpoint Connections to Yes, the value of Endpoint Connection Status in the Configure Private Connection panel of the Elasticsearch console is Connected. In this case, you can click Deny Connection in the Actions column.
    • If you set Automatically Accept Endpoint Connections to No, the value of Endpoint Connection Status in the Configure Private Connection panel of the Elasticsearch console is Disconnected. In this case, you can click Allow Connection in the Actions column.
    Whether to Enable Zone Affinity We recommend that you set this parameter to Yes.
    Description Enter a description for the endpoint service. The description must be 2 to 256 characters in length and cannot start with http:// or https://.
  6. Click OK.

Step 3: Configure a private connection to the Alibaba Cloud Elasticsearch cluster

  1. Log on to the Elasticsearch console.
  2. In the left-side navigation pane, click Elasticsearch Clusters.
  3. Navigate to the desired cluster.
    1. In the top navigation bar, select the resource group to which the cluster belongs and the region where the cluster resides.
    2. In the left-side navigation pane, click Elasticsearch Clusters. On the Elasticsearch Clusters page, find the cluster and click its ID.
  4. In the left-side navigation pane of the Basic Information page, click Security.
  5. In the Network Settings section, click Edit on the right side of Configure Private Connection.
  6. In the Configure Private Connection panel, click Add Private Connection. In the Create Private Connection dialog box, select the endpoint service that is created in Step 2 and select a zone. Then, select the check box.
    Select a zone
  7. Click OK. Then, the endpoint service attempts to connect to the associated endpoint. If the value of Endpoint Connection Status is Connected, the endpoint service is connected to the associated endpoint.
    Connected

Step 4: Obtain the domain name of the endpoint

After the preceding steps are performed, you must obtain the domain name of the associated endpoint to configure a remote reindex whitelist.

  1. In the Configure Private Connection panel, click the ID of the endpoint in the Endpoint ID column.
    Click the endpoint ID
  2. On the Endpoint Connections tab of the page that appears, click the Expand icon next to the ID of the endpoint. Then, you can view the domain name of the endpoint.
    View the domain name

Step 5: Configure a remote reindex whitelist for the Alibaba Cloud Elasticsearch cluster

Notice After you configure a remote reindex whitelist for the Alibaba Cloud Elasticsearch cluster, the system restarts the cluster. We recommend that you perform this operation during off-peak hours.
  1. Log on to the Elasticsearch console.
  2. In the left-side navigation pane, click Elasticsearch Clusters.
  3. Navigate to the desired cluster.
    1. In the top navigation bar, select the resource group to which the cluster belongs and the region where the cluster resides.
    2. In the left-side navigation pane, click Elasticsearch Clusters. On the Elasticsearch Clusters page, find the cluster and click its ID.
  4. In the left-side navigation pane of the page that appears, click Cluster Configuration.
  5. On the page that appears, click Modify Configuration on the right side of YML Configuration.
  6. In the YML File Configuration panel, specify the domain name that is obtained in Step 4 in Other Configurations.
    Sample code:
    reindex.remote.whitelist: 'ep-bp1nitq0krp8yhcf****-cn-hangzhou-i.epsrv-bp1zczi0fgoc5qtv****.cn-hangzhou.privatelink.aliyuncs.com:9200'
    Configure a remote reindex whitelist
  7. Click OK.

Step 6: Migrate data

  1. On the Dev Tools page in the Kibana console of the Alibaba Cloud Elasticsearch cluster, run the following command to migrate data.
    Note For more information about how to log on to the Kibana console, see Log on to the Kibana console.
    POST /_reindex?pretty
    {
    
      "source": {
    
        "remote": {
    
          "host": "http://ep-bp1nitq0krp8yhcf****-cn-hangzhou-i.epsrv-bp1zczi0fgoc5qtv****.cn-hangzhou.privatelink.aliyuncs.com:9200",
    
          "username": "elastic",
    
          "password": "Elastic@123***"
    
        },
    
        "index": "source",
        "size": 5000
    
      },
    
      "dest": {
    
        "index": "dest"
    
      }
    
    }

    For more information, see the reindex API.

  2. Optional:If you want to obtain detailed information about all running reindex requests during data migration, run the following command:
    GET _tasks?detailed=true&actions=*reindex
  3. View data migration results.
    After the data migration is complete, you can run the following command to view the data migration results:
    GET _cat/indices?
    In the following figure, the test index is the destination index. If the health status and data volume of the index are normal, the data migration is successful. View data migration results

FAQ

Problem: What do I do if the source index stores large volumes of data and the data migration is slow?

Solution:
  • If you use the reindex API to migrate data, data is migrated in scroll mode. To improve the efficiency of data migration, you can increase the scroll size or configure a sliced scroll. The sliced scroll can parallelize the reindex process. For more information, see the reindex API.
  • If the self-managed Elasticsearch cluster stores large volumes of data, we recommend that you use snapshots stored in Object Storage Service (OSS) to migrate data. For more information, see Use OSS to migrate data from a self-managed Elasticsearch cluster to an Alibaba Cloud Elasticsearch cluster.
  • If the source index stores large volumes of data, you can set the number of replica shards to 0 and the refresh interval to -1 for the destination index before you migrate data to accelerate data migration. After data is migrated, restore the settings to the original values.
    // You can set the number of replica shards to 0 and disable the refresh feature to accelerate the data migration. 
    curl -u user:password -XPUT 'http://<host:port>/indexName/_settings' -d' {
            "number_of_replicas" : 0,
            "refresh_interval" : "-1"
    }'
    // After data is migrated, set the number of replica shards to 1 and the refresh interval to 1s, which is the default value. 
    curl -u user:password -XPUT 'http://<host:port>/indexName/_settings' -d' {
            "number_of_replicas" : 1,
            "refresh_interval" : "1s"
    }'