If performance issues arise from an imbalanced or insufficient number of primary shards in your Elasticsearch indices (e.g., too much data per shard), the _split API allows you to increase the number of primary shards for an existing index. This operation is performed online, minimizing downtime, and is significantly faster than reindexing data, which can be critical for large datasets. This guide describes how to use the _split API to create a new index with more primary shards from an existing one.
Prerequisites
Cluster health: Your Elasticsearch cluster must be healthy and operating under normal load.
Disabling write: The source index is read-only. Example:
PUT /my_old_index/_settings { "settings": { "index.blocks.write": true } }
Usage notes
Before using the _split API, ensure the following:
Target shard count: The number of primary shards for the index must be a factor of the
index.number_of_routing_shardsparameter defined when the source index was created. It must also be a multiple of the source index's primary shard count.Example: If the source index has 2 primary shards and
number_of_routing_shardsis 24, the index can have 4, 6, 8, 12, or 24 primary shards.For more guidance on shard evaluation, see Assess shards.
Target index name: The Elasticsearch cluster must not already contain an index with the same name as your intended target index.
Disable write operations (source index): Data write operations must be disabled for the source index before splitting.
Sufficient disk space: The Elasticsearch cluster must have sufficient disk space to accommodate the target index.
Off-peak operation: We highly recommend performing index split operations during off-peak hours due to the resource consumption involved in segment merging.
Version compatibility: For Elasticsearch V7.0 and later, if
index.number_of_routing_shardswas not explicitly configured during source index creation, the index is split by a factor of 2 by default, with a maximum of 1,024 primary shards.
Procedure
Log on to the Kibana console of your Elasticsearch cluster and go to the Kibana homepage.
In the left navigation menu, click Dev tools.
Execute the
_splitAPI:
Splitmy_old_indexintomy_new_indexwith your desired number of primary shards. Simultaneously, enable write operations for the target index.POST my_old_index/_split/my_new_index { "settings": { "index.number_of_shards": 12, "index.blocks.write": null } }Replace
my_old_indexwith the name of your existing index.Replace
my_new_indexwith the desired name for the target index.Set
index.number_of_shardsto your target primary shard count (e.g., 12). Ensure this value adheres to the target shard count constraint.index.blocks.write: nullremoves the write block from the target index.
Verify split progress and cluster health
After initiating the split, monitor its progress and ensure your cluster remains healthy.
Query split progress:
Use the_cat recoveryAPI to check for any ongoing shard recoveries related to the split.GET _cat/recovery?v&active_onlyIf no indices are displayed in the "index" column, and
active_onlyyields no relevant recoveries, the split is likely complete.Query cluster health:
Confirm your Elasticsearch cluster's health status.GET _cluster/healthA response containing
"status" : "green"indicates a healthy cluster.
Troubleshooting and FAQ
Reference: Understanding the _split API
The _split API is designed to address scenarios where an index's initial primary shard count becomes insufficient for its growing data volume or query load. Unlike the time-consuming _reindex API, _split operates by internally re-routing data without a full copy, allowing for a much faster expansion of primary shards.
Key capability: Split an existing index into a new index with a greater number of primary shards.
Availability: Elasticsearch V6.X and later versions.
Official documentation: For detailed information, see the Split index API.
_split vs. _reindex performance comparison
Here's a comparison of the _split API's performance against the _reindex API in a test environment:
Test environment:
Nodes: five data nodes (8 vCPUs, 16 GiB memory each)
Data volume: 183 GiB in the source index
Shards: 5 primary shards (source), 20 primary shards (target), 0 replica shards (both)
Test results
Method
Consumed time
Resource usage
reindex2.5 hours
Excessively high write QPS, high data node resource utilization.
_split3 minutes
CPU utilization ~78% per data node, minute-average load ~10 per data node.