If performance issues are caused by inappropriate shard configurations when you use an Elasticsearch cluster, you can use the _split API to split indexes in the Elasticsearch cluster into new indexes with more primary shards in online mode. For example, if the number of primary shards for an index is small, large amounts of data may be stored in each primary shard. As a result, the cluster performance may be affected. This topic describes how to use the _split API to split an existing index into a new index with more primary shards.

Background information

After an index is created, you cannot change the number of primary shards for the index. In most cases, if you want to change the number of primary shards for an existing index, you need to call the reindex API to reindex data, which is time-consuming. To resolve this issue, Elasticsearch provides the _split API in Elasticsearch V6.X and later versions. You can use this API to split an existing index into a new index with more primary shards in online mode. For more information about the API, see Split index API.

The following descriptions provide information about performance tests performed on the reindex API and _split API:
  • Test environment:
    • Nodes: five data nodes, each of which offers 8 vCPUs and 16 GiB of memory
    • Data volume: 183 GiB of data stored in an index
    • Number of shards: five primary shards for the original index, 20 primary shards for the new index, and no replica shards for both indexes
  • Test results
    Method Consumed time Resource usage
    reindex API 2.5 hours The write QPS in the cluster is excessively high, and the resource usage of the data nodes is high.
    _split API 3 minutes The CPU utilization of each data node is approximately 78%, and the minute-average load of each data node is approximately 10.

Prerequisites

  • The Elasticsearch cluster is healthy, and the load of the cluster is normal.
  • The number of primary shards that can be obtained after the index is split is evaluated based on the number of data nodes in and the disk space of the Elasticsearch cluster. For more information, see Shard evaluation.
  • Data write operations are disabled for the index. The Elasticsearch cluster does not contain an index that is named the same as the new index.
  • The Elasticsearch cluster has sufficient disk space to store the new index.

Procedure

  1. Log on to the Kibana console of your Elasticsearch cluster and go to the homepage of the Kibana console as prompted.
    For more information about how to log on to the Kibana console, see Log on to the Kibana console.
    Note In this example, an Elasticsearch V7.10.0 cluster is used. Operations on clusters of other versions may differ. The actual operations in the console prevail.
  2. In the upper-right corner of the page that appears, click Dev tools.
  3. On the Console tab of the page that appears, run the following command to create an index. In the command, configure the index.number_of_routing_shards parameter to specify the number of routing shards and the index.number_of_shards parameter to specify the number of primary shards.
    The number of primary shards that can be obtained after an index is split is a factor of the value of the index.number_of_routing_shards parameter and a multiple of the value of the index.number_of_shards parameter. In this example, an index named dest1 is created in an Elasticsearch V7.10 cluster. The index.number_of_routing_shards parameter is set to 24, and the index.number_of_shards parameter is set to 2. In this case, the number of primary shards that can be obtained after the index is split is 4, 6, 8, 12, or 24.
    Note You must replace dest1 in the following command based on your business requirements.
    PUT /dest1
    {
      "settings": {
        "index": {
          "number_of_routing_shards": 24,
          "number_of_shards":2
        }
      }
    }
    Parameter Description
    number_of_routing_shards The number of routing shards. This parameter defines the number of times the original index can be split or the numbers of primary shards that can be obtained after the split. When you create an index, you must make sure that the number of primary shards configured for the index is a factor of the value of this parameter.
    Note
    • If you want to split an index in an Elasticsearch cluster of a version earlier than V7.0, you must configure the index.number_of_routing_shards parameter in the command that is used to create the index, and the maximum value of this parameter is 1024. By default, for an Elasticsearch cluster of V7.0 or later, the value of the index.number_of_routing_shards parameter is related to the number of primary shards for the index that you want to split in the cluster. If you want to split an index in an Elasticsearch cluster of V7.0 or later and this parameter is not configured in the command used to create the index, the index is split by a factor of 2 by default, and a maximum of 1,024 primary shards can be obtained after the split. For example, if the number of primary shards for the original index is 1, the number of primary shards obtained after the split can be a number from 1 to 1,024. If the number of primary shards for the original index is 5, the number of primary shards that can be obtained after the split is 10, 20, 40, 80, 160, 320, or 640. The maximum number of primary shards that can be obtained after the split is 640.
    • When you split an index whose primary shards have been shrunk down by using the _shrink API, you must make sure that the number of primary shards obtained after the split is a multiple of the number of primary shards before the split. For example, if the number of primary shards before the split is 5, the number of primary shards obtained after the split must be a multiple of 5, such as 10, 15, 20, 25, and 30. The number of primary shards obtained after the split cannot exceed 1,024.
    number_of_shards The number of primary shards for the index.
  4. Insert data.
    Note The following data is used only for testing.
    POST /dest1/_doc/_bulk
    {"index":{}}
    {"productName":"Product A","annual_rate":"3.2200%","describe":"A product that allows you to select whether to receive push messages for returns."}
    {"index":{}}
    {"productName":"Product B","annual_rate":"3.1100%","describe":"A product that daily pushes messages for returns credited to your account."}
    {"index":{}}
    {"productName":"Product C","annual_rate":"3.3500%","describe":"A product that daily pushes messages for returns immediately credited to your account."}
  5. Disable data write operations for the index.
    PUT /dest1/_settings
    {
      "settings": {
        "index.blocks.write": true
      }
    }
  6. Split the original index into a new index with more primary shards and enable data write operations for the new index.
    POST dest1/_split/dest3
    {
      "settings": {
        "index.number_of_shards": 12,
        "index.blocks.write": null
      }
    }
    In this example, the original index dest1 is split into the new index dest3 by using the _split API. The number of primary shards for the new index is 12, and data write operations are enabled for the new index.
    Notice
    • The number of primary shards for the original index is 2, and the index.number_of_routing_shards parameter is set to 24. In this case, the number of primary shards for the new index must be a multiple of 2 and cannot exceed 24. Otherwise, an error is reported in the Kibana console.
    • During the split, the system merges the segments on nodes. This operation consumes the computing resources of the Elasticsearch cluster and increases the loads on the cluster. Therefore, before you split an index, you must make sure that your Elasticsearch cluster has sufficient disk space. We recommend that you split indexes during off-peak hours.
    • You must replace dest1 and dest3 in the preceding commands based on your business requirements.
  7. View the result.
    Call the _cat recovery API to query the index split progress. If no recoveries about shard split are returned and the Elasticsearch cluster is healthy, the index split is complete.
    • Query the index split progress
      GET _cat/recovery?v&active_only

      If no index that is waiting to be split is displayed in the index column in the returned result, no recoveries about index split exist.

    • Query the health status of the Elasticsearch cluster
      GET _cluster/health

      If the returned result contains "status" : "green", the Elasticsearch cluster is healthy.

FAQ

Q: Why are the CPU utilization and minute-average load of each data node in my Elasticsearch cluster not reduced after the index split operation is complete?

A: When you split an index, the system reroutes the documents in the index, and the new index contains a large number of docs.deleted documents. If you run the GET _nodes/hot_threads command, you can view that a merge operation is being performed on the original index. The merge operation consumes a large number of computing resources of the Elasticsearch cluster. We recommend that you split indexes during off-peak hours.