Imbalanced loads on an Alibaba Cloud Elasticsearch cluster may be caused by several reasons. These reasons include inappropriate shard settings, uneven segment sizes, unseparated hot and cold data, and persistent connections that are used for Service Load Balancer (SLB) instances and multi-zone architecture. This topic describes the analysis and solutions for imbalanced loads on an Elasticsearch cluster.

Problem description

Causes

  • Shard allocation is inappropriate.
    Notice In most cases, imbalanced loads are caused by inappropriate shard allocation. We recommended that you first check shard allocation.
  • Segment sizes are uneven.
  • Hot data and cold data are not separated on nodes.
    Notice For example, if you add the routing parameter in queries or query hot data, imbalanced loads occur.
  • Persistent connections are used for SLB instances and multi-zone architecture. In this case, traffic may be unevenly distributed to nodes. However, this rarely occurs. For more information, see Multi-zone architecture.
Notice If imbalanced loads occur due to other reasons, contact Alibaba Cloud technical support engineers for troubleshooting.

Inappropriate shard allocation

  • Scenario

    Company A has purchased an Alibaba Cloud Elasticsearch cluster. The cluster contains three dedicated master nodes and nine data nodes. Each dedicated master node offers 16 vCPUs and 32 GiB of memory. Each data node offers 32 vCPUs and 64 GiB of memory. Major data is stored in the test index. During peak hours (16:21 to 18:00), the read performance is about 2,000 QPS and the write performance is 1,000 QPS. Both cold and hot data are queried. In addition, the CPU utilization of two nodes reaches 100%, which affects the Elasticsearch service.

  • Analysis
    1. Check the network and Elastic Compute Service (ECS) instances. If ECS instances are normal, view network monitoring data.

      The network monitoring data shows that the number of network requests increases during the peak hours. At the same time, the QPS of read operations increases, and the CPU utilization of related nodes significantly increases. Based on the preceding information, you can conclude that the nodes with high loads are mainly used to process query requests.

    2. Run the GET _cat/shards?v command to query the shards of the index.

      The command output shows that nodes with high loads contain a large number of shards. This indicates that shards are not evenly allocated to nodes. In addition, the monitoring data for disk usage shows that the disk usage of nodes with high loads is greater than that of other nodes. You can conclude that the uneven allocation of shards results in uneven storage. When you query or write data, nodes with high storage handle major workloads.

    3. Run the GET _cat/indices?v command to query information of the index.

      The command output shows that the index has five primary shards and one replica shard for each primary shard. In addition, cluster configurations indicate that shards are not evenly allocated and specific documents are deleted. When Elasticsearch searches for data, it also searches for and filters documents marked with .del. This significantly reduces search efficiency and consumes additional resources. We recommend that you call the force merge operation during off-peak hours.

      Deleted index
    4. View cluster logs and search slow logs.

      The logs show that the queries are all normal term queries, and the cluster logs do not indicate that an error has occurred. Therefore, the Elasticsearch cluster does not encounter errors and query statements that consume CPU resources.

  • Summary
    The preceding analysis indicates that the uneven CPU utilization is mainly caused by uneven shard allocation. You must re-allocate shards for the index. Make sure that the total number of primary shards and replica shards is a multiple of data nodes in the cluster. After the optimization, CPU utilization is not significantly different between nodes. The following figure shows the CPU utilization.CPU utilization after optimization
  • Solution

    Plan shards properly when you create indexes. For more information, see Shard evaluation guidelines.

Shard evaluation guidelines

Both the number of shards and size of each shard contribute to the stability and performance of an Elasticsearch cluster. You must properly plan shards for each index of an Elasticsearch cluster. This prevents numerous shards from affecting cluster performance when it is difficult to define business scenarios. This section provides the following guidelines:
Note In versions earlier than Elasticsearch V7.X, one index has five primary shards and one replica shard for each primary shard by default. In Elasticsearch V7.X and later, one index has one primary shard and one replica shard by default.
  • For nodes with low specifications, the size of each shard is no more than 30 GB. For nodes with high specifications, the size of each shard is no more than 50 GB.
  • For log analysis or extremely large indexes, the size of each shard is no more than 100 GB.
  • The total number of primary shards and replica shards is the same as or a multiple of the number of data nodes.
    Note The more shards, the more performance overheads of your Elasticsearch cluster.
  • You can determine the number of shards on a single node based on the memory size multiplied by 30. If a large number of shards are planned, file handle exhaustion can easily occur and result in cluster failures.
  • You can configure a maximum of five primary shards for an index on a node.

Uneven segment sizes

  • Scenario
    A node in the Elasticsearch cluster of Company A experiences an abrupt increase in CPU utilization. This affects query performance. Queries are mainly performed on the test index. The index has three primary shards and one replica shard for each primary shard. The shards are evenly allocated to nodes. The index contains a large number of documents that are marked with docs.deleted. In addition, you have checked that ECS instances are normal.Imbalanced loads caused by large segment sizes
  • Analysis
    1. Add "profile": true to the query body.

      The query results show that Elasticsearch requires a longer time to query Shard 1 of the test index than other shards.

    2. Send a query request with preference set to _primary and "profile": true added to the query body, and view the time required to query the primary shard. Then, send another query request with preference set to _replica and "profile": true added to the query body, and view the time required to query the replica shard.

      The time required to query Shard 1 (primary shard) is longer than that required to query its replica shard. This indicates that imbalanced loads are caused by Shard 1.

    3. Run the GET _cat/segments/index?v&h=shard,segment,size,size.momery,ip and GET _cat/shards?v commands to query the information of Shard 1.
      The command outputs show that Shard 1 contains large segments and the number of documents in the shard is greater than that in its replica shard. Based on the preceding information, you can determine that imbalanced loads are caused by uneven segment sizes.
      Note The inconsistency in the number of documents is caused by many different reasons. Examples:
      • A latency exists in data synchronization between primary and replica shards. If documents are continuously written to the primary shard, data inconsistency may occur. However, after you stop writing documents, the number of documents is the same between the primary shard and its replica shard.
      • A primary shard forwards requests to its replica shards after data is written to the primary shard. If you use automatically generated document IDs to write documents to a primary shard, you cannot perform delete operations on the primary shard during the write operation. If you perform a delete operation (such as sending a Delete by Query request to delete the document that you have just written), the operation is also performed on the replica shard. Then, the primary shard forwards the write request to the replica shard. As the document ID is automatically generated by the system, the document is written to the replica shard without verification. As a result, the number of documents in the replica shard differs from that in the primary shard. In addition, the primary shard includes a large number of documents that are marked with docs.deleted.
  • Solution
    • Solution 1: During off-peak hours, call the force merge operation to merge small segments and remove documents that are marked with docs.deleted.
    • Solution 2: Restart the node where the primary shard resides to promote the replica shard to a primary shard. Use the new primary shard to generate a new replica shard. This ensures that the segments in the new primary and replica shards are the same.
    The following figure shows the loads after optimization.Loads after optimization

Multi-zone architecture

  • Scenario
    To deploy an Elasticsearch cluster across zones, Company A deploys the multi-zone architecture in both Zone B and Zone C. When the cluster provides services, the loads of nodes in Zone C are higher than the loads of nodes in Zone B. You have checked that the imbalanced loads are not caused by hardware or uneven data distribution.Imbalanced loads caused by multi-zone architecture
  • Analysis
    1. View the CPU utilization of nodes in the two zones within the last four days.
      The monitoring data shows that the CPU utilization of the nodes significantly changed.CPU utilization within the last four days
    2. View the TCP connections to the nodes.
      The monitoring data shows that the number of TCP connections in the two regions significantly differs. This indicates that the imbalanced loads are caused by network connections.Number of TCP connections to nodes
    3. Check client connections.

      The client uses persistent connections and establishes a small number of new connections. This scenario is at risk of independent scheduling for a multi-zone network. Network services are independently scheduled based on the number of connections. Each scheduling unit selects the optimal node to create a connection. Independent scheduling provides higher performance. However, if the number of new connections is small, most scheduling units may choose the same node to establish a connection. A client node of an Elasticsearch cluster first forwards requests to another node that resides in the same zone. This causes imbalanced loads between zones.

  • Solution
    • Solution 1: Use independent client nodes to forward complex traffic. In this case, data nodes are not affected even though the client nodes are heavily loaded. This reduces the risk of load imbalance.
    • Solution 2: Configure two independent domain names on the client to ensure balanced traffic on the client. This solution cannot ensure high availability. When you upgrade the configuration of an Elasticsearch cluster, access failures may occur because nodes are not removed from the SLB instance.
    The following figure shows the loads after optimization.Loads after optimization