Migrate a multi-type index to a single-type index using the reindex API - Elasticsearch

Limits

Alibaba Cloud Elasticsearch network architecture was adjusted in October 2020:

Clusters created before October 2020 use the original network architecture.
Clusters created in October 2020 or later use the new network architecture.

In the new network architecture, cross-cluster reindex requires PrivateLink to establish private connections between virtual private clouds (VPCs). The following table maps your scenario to the appropriate data migration solution.

Scenario	Network architecture	Solution
Migrate between Alibaba Cloud Elasticsearch clusters	Both clusters in the original architecture	Use the reindex API to migrate data between Alibaba Cloud Elasticsearch clusters
Migrate between Alibaba Cloud Elasticsearch clusters	One cluster in the original architecture (the other can be in either architecture)	Use NLB and PrivateLink to establish a private connection between Alibaba Cloud Elasticsearch clusters (reindex API) or Use Alibaba Cloud Logstash to migrate data from a self-managed Elasticsearch cluster to an Alibaba Cloud Elasticsearch cluster
Migrate from a self-managed Elasticsearch cluster on ECS to Alibaba Cloud Elasticsearch	Alibaba Cloud Elasticsearch in the original architecture	Use the reindex API to migrate data from a self-managed Elasticsearch cluster to an Alibaba Cloud Elasticsearch cluster
Migrate from a self-managed Elasticsearch cluster on ECS to Alibaba Cloud Elasticsearch	Alibaba Cloud Elasticsearch in the new architecture	Migrate data from a self-managed Elasticsearch cluster to an Alibaba Cloud Elasticsearch cluster deployed in the new network architecture

Prerequisites

Before you begin, make sure you have:

An Alibaba Cloud Elasticsearch V5.5.3 cluster with a multi-type index (for example, a twitter index with tweet and user types) and data inserted into the index. For more information, see Create an Alibaba Cloud Elasticsearch cluster.
An Alibaba Cloud Elasticsearch V6.7.0 cluster in the same VPC as the V5.5.3 cluster
An Alibaba Cloud Logstash cluster in the same VPC as the Elasticsearch clusters. For more information, see Create a Logstash cluster.

Step 1: Convert the multi-type index into single-type indexes

Choose one of the following conversion methods based on your data model:

Method	When to use
Combine types	Your application can distinguish document types using a custom field. All documents are merged into one index.
Split into separate indexes	Each type maps cleanly to an independent use case, and you want to keep the data separated.

Method 1: Combine types

This method merges all document types into a single index. A Painless script adds a custom type field to each document to preserve the original type information, and prepends the original _type value to each document's _id to prevent ID collisions across types.

Enable Auto Indexing on the Elasticsearch V5.5.3 cluster.
1. Log on to the Elasticsearch console.
2. In the left-side navigation pane, click Elasticsearch Clusters.
3. In the top navigation bar, select a resource group and a region.
4. On the Elasticsearch Clusters page, find the V5.5.3 cluster and click its ID.
5. In the left-side navigation pane, click Cluster Configuration.
6. Click Modify Configuration next to YML File Configuration.
7. In the YML File Configuration panel, set Auto Indexing to Enable. > Warning: This operation restarts the cluster. Make sure the restart does not affect your services before proceeding.
8. Select This operation will restart the cluster. Continue? and click OK.
Log on to the Kibana console of the V5.5.3 cluster. For more information, see Log on to the Kibana console.
In the left-side navigation pane, click Dev Tools.
On the Console tab, run the following command to combine all types into a single index:
- Sets ctx._id to <original_type>-<original_id> to avoid ID collisions between types.
- Adds a type field to ctx._source with the original type value, so your application can still filter by type.
- Sets ctx._type to "doc" — the single type required by V6.X.
```
POST _reindex
{
  "source": {
    "index": "twitter"
  },
  "dest": {
    "index": "new1"
  },
  "script": {
    "inline": """
    ctx._id = ctx._type + "-" + ctx._id;
    ctx._source.type = ctx._type;
    ctx._type = "doc";
    """,
    "lang": "painless"
  }
}
```
The script does the following for each document:
Run GET new1/_mapping to verify the mapping of the new index.
Run the following command to confirm the merged data looks correct:
```
GET new1/_search
{
  "query": {
    "match_all": {}
  }
}
```

Method 2: Split into separate indexes

This method creates a dedicated index for each type. Use separate POST _reindex calls — one per type — with "type" specified in source to filter documents.

Log on to the Kibana console of the V5.5.3 cluster and open Dev Tools.

On the Console tab, run the following commands to split the twitter index into twitter_tweet and twitter_user:

POST _reindex
{
  "source": {
    "index": "twitter",
    "type": "tweet",
    "size": 10000
  },
  "dest": {
    "index": "twitter_tweet"
  }
}
POST _reindex
{
  "source": {
    "index": "twitter",
    "type": "user",
    "size": 10000
  },
  "dest": {
    "index": "twitter_user"
  }
}

"size": 10000 sets the batch size for each reindex request.

Run the following commands to verify the data in the new indexes:

GET twitter_tweet/_search
{
  "query": {
    "match_all": {}
  }
}

GET twitter_user/_search
{
  "query": {
    "match_all": {}
  }
}

Step 2: Use Logstash to migrate data

Go to the Logstash Clusters page of the Alibaba Cloud Elasticsearch console.
In the top navigation bar, select the region where the Logstash cluster resides.
On the Logstash Clusters page, find the cluster and click its ID.
In the left-side navigation pane, click Pipelines, then click Create Pipeline.

In the Create wizard, enter a pipeline ID and configure the pipeline. The following example reads from the V5.5.3 cluster and writes to the V6.7.0 cluster:

input {
  elasticsearch {
    hosts => ["http://es-cn-0pp1f1y5g000h****.elasticsearch.aliyuncs.com:9200"]
    user => "elastic"
    password => "your_password"
    index => "*"
    docinfo => true
  }
}
filter {
}
output {
  elasticsearch {
    hosts => ["http://es-cn-mp91cbxsm000c****.elasticsearch.aliyuncs.com:9200"]
    user => "elastic"
    password => "your_password"
    index => "test"
  }
}

For details on pipeline configuration syntax, see Logstash configuration files.

Click Next to configure pipeline parameters.

Warning

Saving pipeline parameters triggers a restart of the Logstash cluster. Make sure the restart does not affect your business before proceeding.

Parameter	Description	Default
Pipeline Workers	Number of worker threads that run filter and output plugins in parallel. Increase this value if CPU resources are underutilized or events are backing up.	Number of vCPUs
Pipeline Batch Size	Maximum number of events a single worker thread collects from input plugins before running filter and output plugins. Higher values increase throughput but require more memory. To support a larger batch size, increase the JVM heap size using the `LS_HEAP_SIZE` variable.	125
Pipeline Batch Delay	Wait time (in milliseconds) before assigning a small batch to a pipeline worker.	50 ms
Queue Type	Internal queue model for buffering events. MEMORY: traditional memory-based queue. PERSISTED: disk-based ACKed queue (persistent).	MEMORY
Queue Max Bytes	Maximum size of the queue. Must be less than your total disk capacity.	1024 MB
Queue Checkpoint Writes	Maximum number of events written before a checkpoint is enforced when using persistent queues. Set to `0` for no limit.	1024

Pipeline parameter configuration

Click Save and Deploy to save the configuration and restart the Logstash cluster immediately, or click Save to store the settings without deploying.
- Save: The settings are stored but do not take effect. To apply them, go to the Pipelines page, find the pipeline, and click Deploy Now in the Actions column.
- Save and Deploy: The Logstash cluster restarts immediately and the settings take effect.

Step 3: Verify the migration results

Log on to the Kibana console of the Elasticsearch V6.7.0 cluster. For more information, see Log on to the Kibana console.
In the left-side navigation pane, click Dev Tools.
On the Console tab, run the following command to list all indexes and confirm the migrated data is present:
```
GET _cat/indices?v
```

Elasticsearch:Use the reindex API to migrate data in a multi-type index of an earlier version