If your business is in the off-peak hours of traffic or the volume of data stored in your cluster decreases, you can remove data nodes from your cluster to scale in the cluster. This topic describes how to remove data nodes from a cluster to scale in the cluster.

Prerequisites

The following operations are performed:

Log on to the Kibana console and check whether your cluster stores indexes in the close state. If your cluster stores such indexes, you must open the indexes. Otherwise, the upgrade fails.
  • Run the following command to view the statuses of indexes:
    GET /_cat/indices?v
    View the statuses of indexes
  • Run the following command to open an index in the close state:
    POST /<index_name>/_open

Precautions

  • After you remove data nodes from a cluster, the system restarts the cluster. The time required for the restart varies based on the size, data volume, and load of your cluster. We recommend that you remove data nodes during off-peak hours.
  • If the indexes of your cluster have replica shards and the load of your cluster is normal, your cluster can still provide services during a restart. The load of a cluster is normal if the CPU utilization of each node in the cluster is about 60%, the heap memory usage of each node in the cluster is about 50%, and the value of NodeLoad_1m for each node is less than the number of vCPUs for the node.
  • When you make a change to a multi-zone cluster, make sure that the number of replica shards of each index in the cluster is less than the number of zones in which the cluster is deployed. After the change is complete, you can manually increase the number of replica shards based on your business requirements. For more information about how to change the number of replica shards of indexes in a cluster, see Index Templates.
  • If the indexes of your cluster do not have replica shards, the load of the cluster is excessively high, and large amounts of data are written to or queried in your cluster, the access to your cluster may time out when you remove data nodes from the cluster. Before you remove data nodes from your cluster, we recommend that you configure a retry mechanism for your client to reduce the impact on your business.

Remove data nodes

  1. Log on to the Elasticsearch console.
  2. In the left-side navigation pane, click Elasticsearch Clusters.
  3. Navigate to the desired cluster.
    1. In the top navigation bar, select the resource group to which the cluster belongs and the region where the cluster resides.
    2. In the left-side navigation pane, click Elasticsearch Clusters. On the Elasticsearch Clusters page, find the cluster and click its ID.
  4. In the lower-right corner of the Basic Information page, choose Configuration Update > Remove Data Nodes.
  5. In the Remove Data Nodes section of the page that appears, configure the Node Type parameter.
  6. Select the data nodes that you want to remove.
    After you select the data nodes that you want to remove, the system checks whether the conditions of removing the data nodes are met. If one or more conditions are not met, a message appears. You must handle the related exceptions as prompted. After the exceptions are handled, you can remove the data nodes.
    Check item Expected result
    Cluster status The cluster is in the Active state (indicated by the color green).
    Index allocation configuration The cluster.routing.allocation.enable parameter is set to all. This value indicates that all types of shards can be allocated to all data nodes in the cluster.
    Distribution of replica shards for each index The replica shards of each index are distributed on different data nodes.
    Number of remaining data nodes after the scale-in The number of remaining data nodes after the scale-in is greater than or equal to two and is greater than half the current number of data nodes.

    For a multi-zone cluster, the number of data nodes in each zone is greater than or equal to two, and the numbers of remaining data nodes in all zones are the same.

    Disk usage of the destination data node for data migration If you want to migrate data during the scale-in, the disk usage of the destination data node after the scale-in is no more than 75%.
    Memory usage of the destination data node for data migration If you want to migrate data during the scale-in, the memory usage of the destination data node after the scale-in is no more than 70%.
    Number of shards on each data node that you want to remove No shards are stored on each data node that you want to remove.
  7. Migrate data.
    For security purposes, make sure that no data is stored on the data nodes you want to remove. If these data nodes store data, the system prompts you to migrate the data. After the data is migrated, no index data is stored on the data nodes, and no index data is written to the data nodes.
    1. Click Data Migration Tool in the message that appears.
      Data migration tool
    2. In the Migrate Data dialog box, select a data migration method.
      Parameter Description
      Smart Migration The system selects the data nodes whose data is to be migrated.
      Custom Migration You must select the data nodes whose data you want to migrate.
    3. Read the terms of data migration and select the check box.
    4. Click OK.
      Then, the system restarts the cluster. During the restart, you can view the data migration progress in the Tasks dialog box. After the cluster is restarted, the data stored on the selected data nodes is migrated. Tasks dialog box
      Note During data migration, you can click Pause in the Tasks dialog box to pause the migration.
  8. In the lower-right corner of the Basic Information page, choose Configuration Update > Remove Data Nodes again.
  9. In the Remove Data Nodes section of the page that appears, select the data nodes whose data is migrated and click OK.
    Then, the system restarts the cluster. During the restart, you can view the scale-in progress in the Tasks dialog box. After the cluster is restarted, the data nodes are removed from the cluster. Scale-in progress

Roll back data migration

Data migration is time-consuming. Cluster status changes or data modifications may result in a data migration failure. You can view detailed information in the Tasks dialog box. To roll back data migration, perform the following steps:

  1. Log on to the Kibana console of your cluster.
    For more information, see Log on to the Kibana console.
    Note In this example, an Elasticsearch V6.7.0 cluster is used. Operations on clusters of other versions may differ. The actual operations in the console prevail.
  2. In the left-side navigation pane, click Dev Tools.
  3. On the Console tab of the page that appears, run the following command to obtain the IP addresses of the data nodes whose data is migrated:
    GET _cluster/settings
    If the command is successfully run, the following result is returned:
    {
      "transient": {
        "cluster": {
          "routing": {
            "allocation": {
              "exclude": {
                "_ip": "192.168.xx.xx,192.168.xx.xx,192.168.xx.xx"
              }
            }
          }
        }
      }
    }                        
  4. Roll back data.
    • Roll back the data on some data nodes. Use the exclude parameter to exclude the data nodes whose data you do not want to roll back.
      PUT _cluster/settings
      {
        "transient": {
          "cluster": {
            "routing": {
              "allocation": {
                "exclude": {
                  "_ip": "192.168.xx.xx,192.168.xx.xx"
                }
              }
            }
          }
        }
      }
    • Roll back the data on all data nodes.
      PUT _cluster/settings
      {
        "transient": {
          "cluster": {
            "routing": {
              "allocation": {
                "exclude": {
                  "_ip": null
                }
              }
            }
          }
        }
      }                            
  5. Run the following command to check whether the data is rolled back:
    GET _cluster/settings

    If the command output does not contain the IP addresses of the data nodes whose data is rolled back, the rollback is successful. You can also check the rollback progress based on whether shards are reallocated to the data nodes.

    Note You can run the GET _cat/shards?v command to check the status of a data migration or rollback task.

FAQ

  • What do I do if the "This operation may cause a shard distribution error or insufficient storage, CPU, or memory resources." message appears?
    Cause
    • Insufficient resources

      After data nodes are removed, the cluster does not have sufficient resources to store system data or handle workloads. The resources include disks, memory, and vCPUs.

    • Shard allocation errors

      Elasticsearch is based on Lucene principles. It does not migrate two or more replica shards of an index on a data node to the same data node. In this case, after data nodes are removed, the number of replica shards in a cluster may be greater than or equal to the number of data nodes. This results in shard allocation errors.

    Solution
    • Insufficient resources

      Run the GET _cat/indices?v command to check whether the resource usage of your cluster, such as disk usage, is greater than the related threshold. Make sure that the cluster has sufficient resources to store data and process requests. If these requirements are not met, upgrade the configuration of the cluster. For more information, see Upgrade the configuration of a cluster.

    • Shard allocation errors
      Run the GET _cat/indices?v command to check whether the number of replica shards in the cluster is less than the number of data nodes after specific data nodes are removed. If this requirement is not met, change the number of replica shards. For more information, see Index Templates. The following sample code provides an example on how to change the number of replica shards to 2 in the index template:
      PUT _template/template_1
      {
        "template": "*",
        "settings": {
          "number_of_replicas": 2
        }
      }  
  • What do I do if the "The cluster is running tasks or in an error status. Try again later." message appears?

    Solution: Run the GET _cluster/health command to check the status of the cluster or go to the pages below Intelligent Maintenance to view the cause.

  • What do I do if the "The nodes in the cluster contain data. You must migrate the data first." message appears?

    Solution: Migrate data. For more information, see Remove data nodes.

  • What do I do if the "The number of nodes that you reserve must be more than two and more than half the current number of nodes." message appears?

    Cause: The number of reserved data nodes does not meet requirements. To ensure cluster reliability and stability, at least two data nodes must be reserved, and the number of data nodes selected for removal or data migration must be no more than half the current number of data nodes.

    Solution: Adjust the data nodes to remove, or upgrade the configuration of the cluster. For more information about how to upgrade the configuration of a cluster, see Upgrade the configuration of a cluster.

  • What do I do if the "The current Elasticsearch cluster configuration does not support this operation. Check the Elasticsearch cluster configuration first." message appears?

    Solution: Run the GET _cluster/settings command to view the cluster configuration. Then, check whether the cluster configuration contains the settings that do not allow data allocation.

  • What do I do if data nodes fail to be removed or data fails to be migrated due to the auto_expand_replicas index setting?

    Cause: You may use the access control feature provided by the X-Pack plug-in. In earlier Elasticsearch versions, this feature applies the "index.auto_expand_replicas" : "0-all" setting to the .security index by default. This causes errors when you migrate data or remove data nodes.

    Solution:
    1. Query index settings.
      GET .security/_settings
      The following result is returned:
      {
        ".security-6" : {
          "settings" : {
            "index" : {
              "number_of_shards" : "1",
              "auto_expand_replicas" : "0-all",
              "provided_name" : ".security-6",
              "format" : "6",
              "creation_date" : "1555142250367",
              "priority" : "1000",
              "number_of_replicas" : "9",
              "uuid" : "9t2hotc7S5OpPuKEIJ****",
              "version" : {
                "created" : "6070099"
              }
            }
          }
        }
      }
    2. Use one of the following methods to modify the auto_expand_replicas index setting:
      • Method 1:
        PUT .security/_settings
        {
          "index" : {
            "auto_expand_replicas" : "0-1"
          }
        }
      • Method 2
        PUT .security/_settings
        {
          "index" : {
            "auto_expand_replicas" : "false",
            "number_of_replicas" : "1"
          }
        }
        Notice The number_of_replicas parameter specifies the number of replica shards for each index. You can configure this parameter based on your business requirements. Make sure that the value of this parameter is greater than or equal to one but no more than the number of available data nodes.