This topic describes how to recreate indexes by calling the Reindex operation. After you recreate indexes in the current Alibaba Cloud Elasticsearch cluster, you can migrate data of indexes in an Elasticsearch cluster of earlier versions to a cluster of the newly released Elasticsearch version.

Configuration example

To recreate indexes in the current Elasticsearch cluster by calling the Reindex operation, you must configure the reindex.remote.whitelist item in the elasticsearch.yml file of the current cluster. This item is used to add the access address of a remote Elasticsearch cluster (an original cluster) to the remote access whitelist of the current cluster.

An access address in the whitelist can be a combination of host and port. Separate the configurations of multiple hosts with commas (,), for example: otherhost:9200,another:9200,127.0.10.**:9200,localhost:**. Only host and port are used to configure security policies because the whitelist ignores the protocol information.
Notice To configure the whitelist, use <Elasticsearch cluster domain>:9200 if a remote Elasticsearch cluster is deployed in a single zone. Otherwise, use the combinations of the IP addresses and ports of all data nodes in the remote Elasticsearch cluster if the cluster is deployed across zones.

After you configure the whitelist, you can call the Reindex operation to recreate indexes. The sample code is as follows:

POST _reindex
{
  "source": {
    "remote": {
      "host": "http://otherhost:9200",
      "username": "user",
      "password": "pass"
    },
    "index": "source",
    "query": {
      "match": {
        "test": "data"
      }
    }
  },
  "dest": {
    "index": "dest"
  }
}
  • host is the address of the remote cluster. The address must include the protocol, domain, and port, for example. https://otherhost:9200.
    Notice
    • If a remote Elasticsearch cluster is deployed in a single zone, set host of the cluster to a value in the format of <Elasticsearch cluster domain>:<9200> and perform the operations described in Connect two Elasticsearch clusters.
    • If a remote Elasticsearch cluster is deployed across zones, set host of the cluster to a value in the format of <IP address of any data node in the Elasticsearch cluster>:<9200> and perform the operations described in Connect two Elasticsearch clusters.
  • The username and password parameters are optional. If the requested Elasticsearch service uses basic authentication, provide the required information in the request. Basic authentication must be implemented over HTTPS. Otherwise, the password will be sent in plaintext. For more information about other parameters, see Reindex API.
Note
  • If the access address of a remote Elasticsearch cluster is added to the whitelist of the current cluster, the current cluster directly sends requests to the remote cluster without the need to verify or modify the request parameters.
  • Recreating indexes from a remote Elasticsearch cluster does not support manual slicing or automatic slicing. For more information, see Manual slicing or Automatic slicing.

Set the batch size

Indexing from a remote Elasticsearch cluster uses on-heap buffer. Default maximum batch size: 100 MB. If the index in the remote cluster contains large documents, you must adjust the batch size to a small value.

In the following example, size is set to 10.
POST _reindex
{
  "source": {
    "remote": {
      "host": "http://otherhost:9200"
    },
    "index": "source",
    "size": 10,
    "query": {
      "match": {
        "test": "data"
      }
    }
  },
  "dest": {
    "index": "dest"
  }
}

Set timeout periods

Use socket_timeout to set the socket read timeout period. Default value: 30s. Use connect_timeout to set the connection timeout period. Default value: 1s.

In the following example, the socket read timeout period is set to one minute and the connection timeout period is set to 10 seconds.

POST _reindex
{
  "source": {
    "remote": {
      "host": "http://otherhost:9200",
      "socket_timeout": "1m",
      "connect_timeout": "10s"
    },
    "index": "source",
    "query": {
      "match": {
        "test": "data"
      }
    }
  },
  "dest": {
    "index": "dest"
  }
}