es unbalanced load - Elasticsearch - Alibaba Cloud Documentation Center

Unbalanced loads on an Alibaba Cloud Elasticsearch (ES) cluster can occur for several reasons. These include inappropriate shard settings, uneven segment sizes, unseparated hot and cold data, and persistent connections used for Server Load Balancer (SLB) instances or multi-zone architecture. This topic describes the analysis and solutions for unbalanced loads on an Elasticsearch cluster.

Problem description

Disk usage is similar among nodes, but the values of the CPU utilization or load_1m metric indicate unbalanced loads.
Disk usage significantly differs among nodes, and the values of the CPU utilization or load_1m metric indicate unbalanced loads.

Cause

Shard allocation is inappropriate.
Important
In most cases, unbalanced loads are caused by inappropriate shard allocation. We recommend that you first check shard allocation.
Segment sizes are uneven.
Hot data and cold data are not separated on nodes.
Important
For example, if you specify the routing parameter in queries or query hot data, unbalanced loads can occur.
When you use SLB instances and multi-zone architecture, persistent connections that are not released may cause uneven traffic distribution (this rarely occurs). For more information, see Uneven persistent connections.

Important

If unbalanced loads occur due to other reasons, contact Alibaba Cloud technical support engineers to troubleshoot the issue.

Inappropriate shard allocation

Scenario
Company A has purchased an Alibaba Cloud Elasticsearch cluster. The cluster contains three dedicated master nodes and nine data nodes. Each dedicated master node offers 16 vCPUs and 32 GiB of memory. Each data node offers 32 vCPUs and 64 GiB of memory. Major data is stored in the test index. During peak hours (16:21 to 18:00), read performance is about 2,000 QPS, write performance is 1,000 QPS, and both cold and hot data are queried. Additionally, the CPU utilization of two nodes reaches 100%, which affects the Elasticsearch service.
Analysis
1. Check the network and Elastic Compute Service (ECS) instances. If ECS instances are normal, view network monitoring data.
  The network monitoring data shows that the number of network requests and the query QPS increases during peak hours. Additionally, the CPU utilization of related nodes significantly increases. Based on this information, you can conclude that the nodes with high loads are mainly used to process query requests.
2. Run the GET _cat/shards?v command to query the shards of the index.
  The command output shows that shards are not evenly allocated to nodes. The shards of the test index are mainly allocated to nodes with high loads. Additionally, the monitoring data for disk usage shows that the disk usage of nodes with high loads is greater than that of other nodes. You can conclude that the uneven allocation of shards results in uneven storage. When you query or write data, nodes that have large storage handle major query and write workloads.
3. Run the GET _cat/indices?v command to query information of the index.
  The command output shows that the index has five primary shards and one replica shard for each primary shard. Additionally, cluster configurations indicate that shards are not evenly allocated and specific documents are deleted. When Elasticsearch searches for data, it also searches for and filters documents marked with .del. This consumes additional resources and significantly decreases search efficiency. We recommend that you call the force merge operation during off-peak hours.
4. View cluster logs and slow search logs.
  The logs show that the queries are all normal term queries, and the cluster logs indicate that no errors occur. Therefore, the Elasticsearch cluster does not encounter errors or query statements that consume CPU resources.
Summary
The preceding analysis indicates that the uneven CPU utilization is mainly caused by uneven shard allocation. You must re-allocate shards for the index. Make sure that the total number of primary and replica shards is a multiple of the number of data nodes in the cluster. After optimization, CPU utilization does not significantly differ among nodes. The following figure shows the CPU utilization.
Solution
Plan shards properly before you create indexes. For more information, see Shard evaluation guidelines.

Shard evaluation guidelines

The number of shards and the size of each shard determine the stability and performance of an Elasticsearch cluster. You must properly plan shards for each index of an Elasticsearch cluster. This prevents numerous shards from affecting cluster performance when it is difficult to define business scenarios.

Note

In versions earlier than Elasticsearch V7.X, one index has five primary shards and one replica shard for each primary shard by default. In Elasticsearch V7.X and later, one index has one primary shard and one replica shard by default.

For nodes with low specifications, the size of each shard should not exceed 30 GB. For nodes with high specifications, the size of each shard should not exceed 50 GB.
For log analytics scenarios or extremely large indexes, the size of each shard should not exceed 100 GB.
The total number of primary shards and replica shards should be the same as or a multiple of the number of data nodes.
Note
The more primary shards you configure, the more performance overhead your Elasticsearch cluster incurs.
We recommend that you determine the number of shards on a single node based on the memory size multiplied by 30. If many shards are planned, file handle exhaustion can easily occur and result in cluster failures.
Configure a maximum of five shards for each index on a node.
If you enable the Auto Indexing feature for your cluster, you can use the scenario-based configuration feature to modify shard configurations. Make sure that shards are evenly allocated. For more information, see Use a scenario-based template to modify the configurations of a cluster.

Uneven segment sizes

Scenario
A node in the Elasticsearch cluster of Company A experiences an abrupt increase in CPU utilization. This affects query performance. Queries are mainly performed on the test index. The index has three primary shards and one replica shard for each primary shard. The shards are evenly allocated to nodes. The index contains many documents that are marked with delete.doc, and you have confirmed that ECS instances are normal.
Analysis
1. Add "profile": true to the query body.
  The query results show that Elasticsearch requires a longer time to query Shard 1 of the test index than other shards.
2. Send a query request with preference=_primary and preference=_replica, body with "profile": true added to the query body, and view the time required to query the primary and replica shards.
  The time required to query Shard 1 (primary shard) is longer than that required to query its replica shard. This indicates that unbalanced loads are caused by Shard 1.
3. Run the GET _cat/segments/index?v&h=shard,segment,size,size.memory,ip and GET _cat/shards?v commands to query the information of Shard 1.
  The command outputs show that Shard 1 contains large segments and the number of documents in the shard is greater than that in its replica shard. Based on this information, you can determine that unbalanced loads are caused by uneven segment sizes.
  Note
  The inconsistency in the number of documents can occur due to various reasons. Examples:
  A latency exists in the data synchronization between primary and replica shards. If documents are continuously written to the primary shard, data inconsistency may occur. However, after you stop writing documents, the number of documents is the same between the primary shard and its replica shard.
  After data is written to a primary shard, the system forwards data write requests to its replica shard. If you use automatically generated document IDs to write documents to a primary shard, you cannot perform delete operations on the primary shard during write operations. If you perform a delete operation (such as sending a Delete by Query request to delete a document that you have just written), the operation is also performed on the replica shard. Then, the system forwards the write request to the replica shard. The document is written to the replica shard without verification because the document ID is automatically generated by the system. As a result, the number of documents in the replica shard differs from that in the primary shard. Additionally, the primary shard includes many documents that are marked with doc.delete.
Solution (choose one)
- During off-peak hours, call the force merge operation to merge small segments and remove documents that are marked with delete.doc.
- Restart the node where the primary shard resides to promote the replica shard to a primary shard. Use the new primary shard to generate a new replica shard. This ensures that the segments in the new primary and replica shards are the same.
The following figure shows the loads after optimization.

Uneven persistent connections

Scenario
Company A deploys an Elasticsearch cluster across two zones: Zone B and Zone C. When the cluster provides services, the loads of the nodes in Zone C are higher than the loads of the nodes in Zone B. You have confirmed that the unbalanced loads are not caused by hardware or uneven data distribution.
Analysis
1. View the CPU utilization of nodes in the two zones over the last four days.
  The monitoring data shows that the CPU utilization of the nodes significantly changed.
2. View the TCP connections to the nodes.
  The monitoring data shows that the number of TCP connections in the two zones significantly differs. This indicates that the unbalanced loads are caused by network connections.
3. Check client connections.
  The client uses persistent connections and establishes a small number of new connections. This scenario is at risk of independent scheduling for a multi-zone network. Network services are independently scheduled based on the number of connections. Each scheduling unit selects the optimal node to create a connection. Independent scheduling provides higher performance. However, if the number of new connections is small, multiple scheduling units may choose the same node to establish a connection. A client node of an Elasticsearch cluster first forwards requests to another node that resides in the same zone. This causes unbalanced loads among zones.
Solution (choose one)
- Configure httpClientBuilder.setConnectionTimeToLive() on the client. For example, to set the connection validity period to 5 minutes: httpClientBuilder.setConnectionTimeToLive(5, TimeUnit.MINUTES). For more information, see HttpAsyncClientBuilder.
  Note
  When configuring the connection validity period on the client, you need to use the ES-recommended httpClientBuilder.setConnectionTimeToLive(). Configuring other parameters may not be as effective, such as httpClientBuilder.setKeepAliveStrategy().
- Restart clients concurrently to establish new connections.
- Use independent client nodes to forward complex traffic. This reduces the risk of load imbalance through job responsibility division. Even if the client nodes are heavily loaded, the data nodes are not affected.
The following figure shows the loads after optimization.

Uneven shards for a single index

Scenario: You observe that shards are evenly distributed across nodes, but the number of shards for a business index is higher on nodes with high loads, or a single shard contains more data on these nodes.
Solution: Set the maximum number of shards allocated to a single index on each node by configuring index.routing.allocation.total_shards_per_node. The parameter value is calculated using the formula (primary + replica shards)/number of data nodes. The parameter value must be an integer. Round up if the result is a decimal.
```
PUT   index_name/_settings
{
  { "index.routing.allocation.total_shards_per_node" : "3" }
}
```