You can use the reindex operation to migrate data from an Elasticsearch cluster of an earlier version to an Elasticsearch cluster of the current version. You can also use this operation to replicate data from an existing index in a local cluster to a new index in a remote cluster. This topic describes the procedure in detail.
Precautions
Due to the adjustment made to the Alibaba Cloud Elasticsearch network architecture, clusters created after October 2020 do not support the X-Pack Watcher and LDAP authentication features. You cannot reindex, search for, and replicate data between a cluster created before October 2020 and a cluster created after October 2020. You can perform the operations only between clusters created before October 2020 or between clusters created after October 2020. The features will be available soon.
Prerequisites
- Two Alibaba Cloud Elasticsearch clusters are created. One is used as a local cluster,
and the other is used as a remote cluster.
For more information about how to create an Elasticsearch cluster, see Create an Alibaba Cloud Elasticsearch cluster. The two clusters must belong to the same virtual private cloud (VPC) and vSwitch. This topic uses an Elasticsearch V6.7.0 cluster as the local cluster and an Elasticsearch V6.3.2 cluster as the remote cluster.
- Test data is prepared.
- Local cluster
Create a destination index in the local cluster.
PUT dest { "settings": { "number_of_shards": 5, "number_of_replicas": 1 } }
- Remote cluster
Prepare the data that you want to migrate. This topic uses the data in the "Quick start" topic as the test data. For more information, see Create an index and Create a document and insert data.Notice If you use a cluster of V7.0 or later, you must set the index type to _doc.
- Local cluster
Background information
- Migrate data between Elasticsearch clusters.
- Reindex data in an index whose shards are inappropriately configured. For example, the data volume is large, but only a few shards are configured for the index. This slows down data write operations.
- Replicate data in an index if the index stores large volumes of data and you want
to change the mapping configuration of the index. This method requires only a short
period of time. You can also import the data into a new index. However, this method
is time-consuming.
Note After you define the mapping configuration for an index in an Elasticsearch cluster and import data into the index, you cannot change the mapping configuration.
Procedure
Summary
Cluster type | Configuration of the reindex whitelist | Configuration of the host parameter |
---|---|---|
Single-zone cluster | Domain name of the cluster:9200 | https://Domain name of the cluster:9200 |
Multi-zone cluster | Combination of the IP addresses of all data nodes in the cluster and the port number | https://IP address of a data node in the cluster:9200 |
Additional information
- Batch size
A remote Elasticsearch cluster uses a heap to cache index data. The default batch size is 100 MB. If an index in the remote cluster contains large documents, you must adjust the batch size to a smaller value.
In the following example, size is set to 10.POST _reindex { "source": { "remote": { "host": "http://otherhost:9200" }, "index": "source", "size": 10, "query": { "match": { "test": "data" } } }, "dest": { "index": "dest" } }
- Timeout periods
Use socket_timeout to set a timeout period for socket reads. The default value is 30s. Use connect_timeout to set a timeout period for connections. The default value is 1s.
In the following example, the timeout period for socket reads is set to 1 minute, and that for connections is set to 10 seconds.POST _reindex { "source": { "remote": { "host": "http://otherhost:9200", "socket_timeout": "1m", "connect_timeout": "10s" }, "index": "source", "query": { "match": { "test": "data" } } }, "dest": { "index": "dest" } }