All Products
Search
Document Center

Elasticsearch:Migrate data from a self-managed Elasticsearch cluster to an Alibaba Cloud Elasticsearch cluster deployed in the new network architecture

Last Updated:Apr 17, 2024

This topic describes how to migrate data from a self-managed Elasticsearch cluster that runs on Elastic Compute Service (ECS) instances to an Alibaba Cloud Elasticsearch cluster that is deployed in the new network architecture. You can use PrivateLink to establish a private connection to the Alibaba Cloud Elasticsearch cluster and use the reindex API to migrate data. The reindex API includes two operations: index creation and data migration.

Prerequisites

  • The self-managed Elasticsearch cluster meets the following requirements:

    • The ECS instances that host the self-managed Elasticsearch cluster are deployed in the same VPC as the Alibaba Cloud Elasticsearch cluster. You cannot use ECS instances that are connected to VPCs over ClassicLink connections.

    • The IP addresses of nodes in the Alibaba Cloud Elasticsearch cluster are added to the security groups of the ECS instances that host the self-managed Elasticsearch cluster. You can query the IP addresses of the nodes in the Kibana console of the Alibaba Cloud Elasticsearch cluster. In addition, port 9200 is enabled.

    • The self-managed Elasticsearch cluster is connected to the Alibaba Cloud Elasticsearch cluster. You can test the connectivity by running the curl -XGET http://<host>:9200 command on the server where you run scripts.

      Note

      You can run all scripts provided in this topic on a server that can be connected to both the self-managed Elasticsearch cluster and Alibaba Cloud Elasticsearch cluster over port 9200.

    • The source index is prepared. In this example, the source index shown in the following figure is used.source索引

  • The Alibaba Cloud Elasticsearch cluster meets the following requirements:

    • The Auto Indexing feature is enabled for the cluster, or a destination index is created in the cluster.

    • Default whitelists are used.

Limits

The network architecture of Alibaba Cloud Elasticsearch was adjusted in October 2020. In the new network architecture, the cross-cluster reindex operation is limited. You need to use the PrivateLink service to establish private connections between VPCs before you perform the operation. The following table provides data migration solutions in different scenarios.

Note

Alibaba Cloud Elasticsearch clusters created before October 2020 are deployed in the original network architecture. Alibaba Cloud Elasticsearch clusters created in October 2020 or later are deployed in the new network architecture.

Scenario

Network architecture

Solution

Migrate data between Alibaba Cloud Elasticsearch clusters

Both clusters are deployed in the original network architecture.

reindex API. For more information, see Use the reindex API to migrate data between Alibaba Cloud Elasticsearch clusters.

One of the clusters is deployed in the original network architecture.

Note

The other cluster can be deployed in the original or new network architecture

Migrate data from a self-managed Elasticsearch cluster that runs on ECS instances to an Alibaba Cloud Elasticsearch cluster

The Alibaba Cloud Elasticsearch cluster is deployed in the original network architecture.

reindex API. For more information, see Use the reindex API to migrate data from a self-managed Elasticsearch cluster to an Alibaba Cloud Elasticsearch cluster.

The Alibaba Cloud Elasticsearch cluster is deployed in the new network architecture.

reindex API. For more information, see Migrate data from a self-managed Elasticsearch cluster to an Alibaba Cloud Elasticsearch cluster deployed in the new network architecture.

Procedure

  1. Step 1: Configure a CLB instance that supports PrivateLink

    Only Classic Load Balancer (CLB) instances that support PrivateLink can serve as service resources for endpoint services. Before you use PrivateLink to establish private connections to access services across VPCs, you must create a CLB instance that supports PrivateLink and configure listening settings for the CLB instance.

  2. Step 2: Create an endpoint service

    Endpoint services are used for establishing private connections. A VPC can use an endpoint to connect to the endpoint service in another VPC. After you configure the CLB instance, you must create an endpoint service.

  3. Step 3: Configure a private connection to the Alibaba Cloud Elasticsearch cluster

    In the Elasticsearch console, associate the Alibaba Cloud Elasticsearch cluster with the endpoint service that is created in Step 2.

  4. Step 4: Obtain the domain name of the endpoint

    After the Alibaba Cloud Elasticsearch cluster is associated with the endpoint service, you can obtain the domain name of the associated endpoint.

  5. Step 5: Configure a remote reindex whitelist for the Alibaba Cloud Elasticsearch cluster

    In the Elasticsearch console, add the domain name that is obtained in Step 4 to the remote reindex whitelist of the Alibaba Cloud Elasticsearch cluster for authorization.

  6. Step 6: Migrate data

    After you complete the preceding steps, you can migrate data from the self-managed Elasticsearch cluster to the Alibaba Cloud Elasticsearch cluster.

Step 1: Configure a CLB instance that supports PrivateLink

  1. Create a CLB instance.

    Make sure that the CLB instance and the ECS instances that act as backend servers are deployed in the same region. For more information, see Create a CLB instance that supports PrivateLink.

  2. Configure protocol and listening settings. Set Select Listener Protocol to TCP and Listening Port to 9200.

    For more information, see Configure protocol and listening settings.

  3. Configure backend servers. Add the ECS instances that host the self-managed Elasticsearch cluster as backend servers and specify port 9200 for the ECS instances.

    For more information, see Configure backend servers.

  4. Click Next. In the Health Check step, configure the parameters based on your business requirements. In this example, the default values of the parameters are used.

  5. After the configuration is complete, click Submit. In the dialog box that appears, click OK. The Instances page appears. On the Instances page, view the health check states of the ECS instances.

    If the health check states of the ECS instances are Normal, the ECS instances are ready to process requests forwarded by the CLB instance.

Step 2: Create an endpoint service

  1. Log on to the endpoint service console.

  2. In the top navigation bar, select the region where you want to create an endpoint service.

    In this example, the China (Hangzhou) region is selected.

  3. On the Endpoints Service page, click Create Endpoint Service. On the page that appears, configure the parameters based on your business requirements.

    For more information, see Create and manage endpoint services. The following table describes some of the parameters. Configure parameters that are not listed in the following table based on your business requirements or retain default values for the parameters.

    Parameter

    Description

    Select Service Resource

    Select a zone to which you want to distribute network traffic. Then, select the CLB instance that you want to associate with the endpoint service.

    CLB instances serve as service resources and can be associated with endpoint services. The CLB instances that are associated with endpoint services receive requests from clients. The zone where an endpoint service is deployed must be the same as the primary zone where the CLB instance you want to associate with the endpoint service is deployed.

    CLB instances can serve as service resources only if they meet the following requirements:

    • Network Type is set to VPC.

    • Feature is set to Support PrivateLink.

    Automatically Accept Endpoint Connections

    Specifies whether to automatically accept connection requests from endpoints. Valid values:

    • Yes: The endpoint service accepts all connection requests from the endpoint that is associated with the endpoint service. In this case, you can use the endpoint to access the endpoint service. We recommend that you set this parameter to Yes.

    • No: The endpoint connection of the endpoint service is in the Disconnected state. In this case, endpoint connection requests to the endpoint service must be manually accepted or denied by the service administrator.

      • If the service administrator accepts endpoint connection requests from the associated endpoint, you can use the associated endpoint to access the endpoint service.

      • If the service administrator denies endpoint connection requests from the associated endpoint, you cannot use the associated endpoint to access the endpoint service.

    Enable Zone Affinity

    We recommend that you set this parameter to Yes.

  4. Click OK.

Step 3: Configure a private connection to the Alibaba Cloud Elasticsearch cluster

  1. Log on to the Alibaba Cloud Elasticsearch console.
  2. In the left-side navigation pane, click Elasticsearch Clusters.
  3. Navigate to the desired cluster.
    1. In the top navigation bar, select the resource group to which the cluster belongs and the region where the cluster resides.
    2. On the Elasticsearch Clusters page, find the cluster and click its ID.
  4. In the left-side navigation pane of the page that appears, choose Configuration and Management > Security.

  5. In the Network Settings section, click Edit on the right side of Configure Private Connection.

  6. In the Configure Private Connection panel, click Add Private Connection. In the Create Private Connection dialog box, select the endpoint service that is created in Step 2 and select a zone. Then, select the check box.

    选择对应可用区

  7. Click OK. Then, the endpoint service attempts to connect to the associated endpoint. If the value of Endpoint Connection Status is Connected, the endpoint service is connected to the associated endpoint.

    连接成功

Step 4: Obtain the domain name of the endpoint

After the preceding steps are performed, you must obtain the domain name of the associated endpoint to configure a remote reindex whitelist.

  1. In the Configure Private Connection panel, click the ID of the endpoint in the Endpoint ID column.

    单击终端节点ID

  2. On the Endpoint Connections tab of the page that appears, click the 展开符 icon next to the ID of the endpoint. Then, you can view the domain name of the endpoint.

    获取域名

Step 5: Configure a remote reindex whitelist for the Alibaba Cloud Elasticsearch cluster

Important

After you configure a remote reindex whitelist for the Alibaba Cloud Elasticsearch cluster, the system restarts the cluster. We recommend that you perform this operation during off-peak hours.

  1. Log on to the Alibaba Cloud Elasticsearch console.
  2. In the left-side navigation pane, click Elasticsearch Clusters.
  3. Navigate to the desired cluster.
    1. In the top navigation bar, select the resource group to which the cluster belongs and the region where the cluster resides.
    2. On the Elasticsearch Clusters page, find the cluster and click its ID.
  4. In the left-side navigation pane of the page that appears, choose Configuration and Management > Cluster Configuration.

  5. On the page that appears, click Modify Configuration on the right side of YML Configuration.

  6. In the YML File Configuration panel, specify the domain name that is obtained in Step 4 in Other Configurations.

    Sample code:

    reindex.remote.whitelist: 'ep-bp1nitq0krp8yhcf****-cn-hangzhou-i.epsrv-bp1zczi0fgoc5qtv****.cn-hangzhou.privatelink.aliyuncs.com:9200'

    配置reindex白名单

  7. Click OK.

Step 6: Migrate data

  1. On the Dev Tools page in the Kibana console of the Alibaba Cloud Elasticsearch cluster, run the following command to migrate data.

    Note

    For more information about how to log on to the Kibana console, see Log on to the Kibana console.

    POST /_reindex?pretty
    {
    
      "source": {
    
        "remote": {
    
          "host": "http://ep-bp1nitq0krp8yhcf****-cn-hangzhou-i.epsrv-bp1zczi0fgoc5qtv****.cn-hangzhou.privatelink.aliyuncs.com:9200",
    
          "username": "elastic",
    
          "password": "Elastic@123***"
    
        },
    
        "index": "source",
        "size": 5000
    
      },
    
      "dest": {
    
        "index": "dest"
    
      }
    
    }

    For more information, see the reindex API.

  2. Optional: If you want to obtain detailed information about all running reindex requests during data migration, run the following command:

    GET _tasks?detailed=true&actions=*reindex
  3. View data migration results.

    After the data migration is complete, you can run the following command to view the data migration results:

    GET _cat/indices?

    If the health status and data volume of the destination index are normal, the data migration is successful.数据迁移结果

FAQ

Problem: What do I do if the source index stores large volumes of data and the data migration is slow?

Solution:

  • If you use the reindex API to migrate data, data is migrated in scroll mode. To improve the efficiency of data migration, you can increase the scroll size or configure a sliced scroll. The sliced scroll can parallelize the reindex process. For more information, see the reindex API.

  • If the self-managed Elasticsearch cluster stores large volumes of data, we recommend that you use snapshots stored in Object Storage Service (OSS) to migrate data. For more information, see Use OSS to migrate data from a self-managed Elasticsearch cluster to an Alibaba Cloud Elasticsearch cluster.

  • If the source index stores large volumes of data, you can set the number of replica shards to 0 and the refresh interval to -1 for the destination index before you migrate data to accelerate data migration. After data is migrated, restore the settings to the original values.

    // You can set the number of replica shards to 0 and disable the refresh feature to accelerate the data migration. 
    curl -u user:password -XPUT 'http://<host:port>/indexName/_settings' -d' {
            "number_of_replicas" : 0,
            "refresh_interval" : "-1"
    }'
    // After data is migrated, set the number of replica shards to 1 and the refresh interval to 1s, which is the default value. 
    curl -u user:password -XPUT 'http://<host:port>/indexName/_settings' -d' {
            "number_of_replicas" : 1,
            "refresh_interval" : "1s"
    }'