All Products
Search
Document Center

Elasticsearch:Select a data migration solution

Last Updated:Feb 21, 2024

You can use Logstash, the reindex API, or Object Storage Service (OSS) to migrate data between Alibaba Cloud Elasticsearch clusters, from a self-managed Elasticsearch cluster to an Alibaba Cloud Elasticsearch cluster, or from a third-party Elasticsearch source to an Alibaba Cloud Elasticsearch cluster. This topic describes the use scenarios and limits of each data migration solution. You can select a solution based on your business requirements.

Important
  • The network architecture of Alibaba Cloud Elasticsearch was adjusted in October 2020. Elasticsearch clusters created before October 2020 are deployed in the original network architecture. Elasticsearch clusters created in October 2020 or later are deployed in the new network architecture. You are not allowed to perform cross-cluster operations, such as reindex, searches, or replication, between a cluster deployed in the original network architecture and a cluster deployed in the new network architecture. If you want to perform these operations between two clusters, you must make sure that the clusters are deployed in the same network architecture. The time when the network architecture in the China (Zhangjiakou) region and the regions outside China was adjusted is uncertain. If you want to perform the preceding operations between a cluster created before October 2020 and a cluster created in October 2020 or later in such a region, submit a ticket to contact Alibaba Cloud technical support to check whether the clusters can be connected.

  • We recommend that you do not migrate system indexes whose names start with a period (.), such as .monitoring, .kibana, and .security indexes. If you migrate these indexes, Kibana may fail.

Data migration between Alibaba Cloud Elasticsearch clusters

Migration solution

Use scenario

Usage note

Example

OSS snapshots

  • The source stores gigabytes, terabytes, or petabytes of data.

  • You want to migrate data in snapshots between Alibaba Cloud Elasticsearch clusters that belong to the same Alibaba Cloud account or different Alibaba Cloud accounts and reside in the same region or different regions.

    Note

    If you want to migrate data between Alibaba Cloud Elasticsearch clusters that reside in different regions, you can use the Elasticsearch commands that are used to create snapshots and restore data together with the cross-region replication (CRR) feature provided by OSS.

  • If you want to use this solution to migrate incremental data, you may need to stop your Elasticsearch service and must disable the destination index before data migration.

  • Before you configure a shared OSS repository, you must make sure that the source and destination Elasticsearch clusters meet the following requirements:

    • The clusters reside in the same region.

    • The clusters belong to the same Alibaba Cloud account or RAM user.

    • The version of the source cluster is earlier than or the same as that of the destination cluster. For more information about version compatibility, see Version compatibility of data restoration from snapshots.

  • When you use Elasticsearch commands that are used to create snapshots and restore data to migrate data in snapshots between Alibaba Cloud Elasticsearch clusters that belong to different Alibaba Cloud accounts, you must use the same AccessKey pair for the clusters. This indicates that you must specify the AccessKey pair of the account to which the desired OSS bucket belongs when you configure a shared OSS repository.

Logstash

  • You want to migrate full or incremental data and do not have high requirements for real-time performance of data migration.

    Important

    If you want to use a Logstash cluster to migrate data between Elasticsearch clusters that belong to different Alibaba Cloud accounts and reside in different regions and the Elasticsearch clusters and the Logstash cluster reside in different VPCs, you must configure Network Address Translation (NAT) gateways for the Logstash cluster. Then, use the gateways to connect the Logstash cluster to the Elasticsearch clusters over the Internet.

  • You want to migrate only query results.

  • The data that you want to migrate needs to be filtered.

  • You want to migrate data between clusters of different major versions. For example, you want to migrate data from a V5.X cluster to a V6.X or V7.X cluster. For more information about version compatibility, see Compatibility matrixes.

  • The source Elasticsearch cluster, Logstash cluster, and destination Elasticsearch cluster must reside in the same VPC. If they reside in different VPCs, you must configure NAT gateways for the Logstash cluster and use the gateways to connect the Logstash cluster to the Elasticsearch clusters over the Internet. For more information, see Configure a NAT gateway for data transmission over the Internet.

  • The versions of the source Elasticsearch cluster, Logstash cluster, and destination Elasticsearch cluster must meet compatibility requirements. For more information, see Compatibility matrixes.

  • If you want to migrate incremental data, you must make sure that the source index ID is the same as the destination index ID and must configure a scheduled migration task.

reindex API

  • The source stores small volumes of data, and you do not have high requirements for the speed of data migration.

  • You want to migrate only query results that are obtained by executing query statements in the Kibana console.

The source and destination Elasticsearch clusters must be deployed in the same network architecture. For more information, see Use the reindex API to migrate data in a multi-type index of an earlier version.

Use the reindex API to migrate data

elasticsearch-dump

You want to migrate small volumes of data in scenarios with a small number of indexes.

Network connections must be established among the source Elasticsearch object, destination Elasticsearch cluster, and the server on which the elasticsearch-dump tool is installed.

Use elasticsearch-dump to migrate data

OSS snapshots and reindex API

  • You want to migrate full and incremental data.

  • You want to migrate data between Alibaba Cloud Elasticsearch clusters of different major versions. You are not allowed to use only OSS snapshots to migrate data between such clusters because the data format may be incompatible.

    For example, to migrate data from a V6.X cluster to a V8.X cluster, you can use the snapshots of the V6.X cluster to migrate the data from the V6.X cluster to a V7.X cluster, call the reindex API in the V7.X cluster to reindex the data, and use the snapshots of the V7.X cluster to migrate data from the V7.X cluster to the V8.X cluster.

The cluster of an intermediate version is compatible with both the source cluster and destination cluster. For more information, see Elastic version changes and compatibility.

Data migration from a self-managed Elasticsearch cluster to an Alibaba Cloud Elasticsearch cluster

Migration solution

Use scenario

Usage note

Example

OSS snapshots

  • The source stores gigabytes, terabytes, or petabytes of data.

  • You want to migrate data in snapshots between a self-managed Elasticsearch cluster and an Alibaba Cloud Elasticsearch cluster that belong to the same Alibaba Cloud account or different Alibaba Cloud accounts and reside in the same region or different regions.

  • The elasticsearch-repository-oss plug-in must be installed on each node of the Alibaba Cloud Elasticsearch cluster. The version of the plug-in must be the same as the version of the plug-in that is installed on each node of the self-managed Elasticsearch cluster.

  • If you want to use this solution to migrate incremental data, you may need to stop your Elasticsearch service and must disable the destination index before data migration.

Use OSS to migrate data from a self-managed Elasticsearch cluster to an Alibaba Cloud Elasticsearch cluster

Logstash

  • You do not have high requirements for real-time performance of data migration.

  • You want to migrate only query results.

  • The data that you want to migrate needs to be filtered.

  • You want to migrate data between clusters of different major versions. For example, you want to migrate data from a 5.x cluster to a V6.X or V7.X cluster. For more information about version compatibility, see Compatibility matrixes.

  • The source Elasticsearch cluster, Logstash cluster, and destination Elasticsearch cluster must reside in the same VPC. If they reside in different VPCs, you must configure NAT gateways for the Logstash cluster and use the gateways to connect the Logstash cluster to the Elasticsearch clusters over the Internet. For more information, see Configure a NAT gateway for data transmission over the Internet.

  • The versions of the source Elasticsearch cluster, Logstash cluster, and destination Elasticsearch cluster must meet compatibility requirements. For more information, see Compatibility matrixes.

  • If you want to migrate incremental data, you must make sure that the source index ID is the same as the destination index ID and must configure a scheduled migration task.

reindex API

  • The source stores small volumes of data, and you do not have high requirements for the speed of data migration.

  • You want to migrate only query results that are obtained by executing query statements in the Kibana console.

  • You want to migrate data from a self-managed Elasticsearch cluster of an earlier version (such as 6.x) to an Alibaba Cloud Elasticsearch cluster of a later version (such as V8.X). In this case, you need to use PrivateLink to establish private connections between the VPCs where the clusters reside and then use the reindex API to migrate data.

The source and destination Elasticsearch clusters must be deployed in the same network architecture. For more information, see Use the reindex API to migrate data in a multi-type index of an earlier version.

Use the reindex API to migrate data from a self-managed Elasticsearch cluster to an Alibaba Cloud Elasticsearch cluster

elasticsearch-dump

You want to migrate small volumes of data in scenarios with a small number of indexes.

Network connections must be established among the source Elasticsearch object, destination Elasticsearch cluster, and the server on which the elasticsearch-dump tool is installed.

Use elasticsearch-dump to migrate data

Data migration from a third-party Elasticsearch source to an Alibaba Cloud Elasticsearch cluster

Migration solution

Use scenario

Usage note

Example

OSS snapshots

The source stores gigabytes, terabytes, or petabytes of data.

If you want to use this solution to migrate incremental data, you may need to stop your Elasticsearch service and must disable the destination index before data migration.

Migrate Elasticsearch index data from Amazon OpenSearch Service to Alibaba Cloud Elasticsearch

Note

This solution is not limited to cloud service providers but depends on the Elasticsearch snapshot mechanism. For example, you can use the Data Online Migration service provided by Alibaba Cloud to migrate Elasticsearch data in snapshots from a Tencent Cloud Object Storage (COS) bucket to an Alibaba Cloud OSS bucket. Then, restore the data to the destination cluster. For information about how to migrate data from a Tencent COS bucket to an Alibaba Cloud OSS bucket, see Migrate data. For information about how to restore data in snapshots, see Create manual snapshots and restore data from manual snapshots.

The version of the destination Elasticsearch cluster must be the same as or later than the source Elasticsearch object. For information about version compatibility, see Version compatibility of data restoration from snapshots.

elasticsearch-dump

You want to migrate small volumes of data in scenarios with a small number of indexes.

Network connections must be established among the source Elasticsearch object, destination Elasticsearch cluster, and the server on which the elasticsearch-dump tool is installed.

Use elasticsearch-dump to migrate data