Evaluate specifications and storage capacity for an Alibaba Cloud Elasticsearch cluster - Elasticsearch

Before you purchase an Alibaba Cloud Elasticsearch cluster or upgrade or downgrade the configuration of an Alibaba Cloud Elasticsearch cluster, you can follow the common evaluation methods provided in this topic to initially evaluate the required resource specifications and storage capacity for the cluster, including the node specifications, node storage space, and number of nodes. Before you create an index or if you encounter issues such as significant differences in disk usage and uneven CPU loads among nodes, you can evaluate the storage capacity of shards and the number of shards for the index.

Precautions

The evaluation methods provided in this topic are obtained based on actual test results and user experience. Different users may have different requirements on the data structure, query complexity, data volume, performance, and data changes. This topic is used for reference only. We recommend that you evaluate the specifications and storage capacity for your Elasticsearch cluster based on actual data and business scenarios.

Storage space evaluation

The storage space of an Elasticsearch cluster is determined by the following factors:

Volume of source data.
Number of replica shards: Each primary shard must have at least one replica shard.
Indexing overheads: In most cases, indexing overheads are 10% greater than those of source data. By default, the overheads of indexing with the _all parameter enabled are not included.
For example, monitoring indexes that are used by X-Pack for exception analysis generate indexing overheads. The monitoring indexes include the following types:
- .monitoring-es-6-*: This type of index consumes a large amount of storage space. By default, Elasticsearch retains only index data that is created in the last seven days.
- .monitoring-kibana-6-*: The amount of storage space consumed by this type of index increases with the number of indexes. By default, Elasticsearch retains only index data that is created in the last seven days.
- .watcher-history-3-*: This type of index consumes only a small amount of storage space. If such indexes are no longer required, you can manually delete the indexes.
Internal overheads of the Elasticsearch cluster: Internal operations such as segment merging and logging generate indexing overheads. 20% of storage space is reserved for such overheads.
The amount of storage space consumed by cluster logs increases with the number of queries and data pushes that the Elasticsearch cluster receives. The cluster logs include run logs, access logs, and slow logs. By default, Elasticsearch retains only logs generated in the last seven days. The duration cannot be changed.

Storage space reserved by the operating system: By default, the operating system reserves 5% of storage space for critical processes, system recovery, and disk fragments.
Security threshold overheads: Elasticsearch reserves at least 15% of storage space as the security threshold.

The recommended storage space is calculated by using the following formula:

Recommended storage space of the cluster = Volume of source data × (1 + Number of replica shards) × Indexing overheads/(1 - Storage space reserved by the operating system)/(1 - Internal overheads of the cluster)/(1 - Security threshold overheads)
                 = Volume of source data × (1 + Number of replica shards) × 1.7
                 = Volume of source data × 3.4

Note

In the preceding formula, the number of replica shards is 1. When you calculate the storage space, you must use the actual number of replica shards of your Elasticsearch cluster in the formula.

Indexes for which the _all parameter is enabled generate larger overheads on storage usage. Based on the test result and use experience, we recommend that you increase storage space for your Elasticsearch cluster by 1.5 times the original value. The recommended storage space is calculated by using the following formula:

Total storage space of all nodes = Volume of source data × (1 + Number of replica shards) × 1.7 × 1.5
                 = Volume of source data × (1 + Number of replica shards) × 2.55

Note

We recommend that you disable the _all parameter unless this parameter is required for your business.

Evaluation of node specifications and the number of nodes

Maximum number of nodes per cluster = Number of vCPUs per node × 5.
The maximum volume of data that each node in an Elasticsearch cluster can store varies based on the business scenario:
- General scenarios: Maximum storage space per node = Memory size per node (GiB) × 30.
- Query scenarios such as acceleration or aggregation on data queries: Maximum storage space per node = Memory size per node (GiB) × 10.
- Logging scenarios such as log data import or offline analytics: Maximum storage space per node = Memory size per node (GiB) × 50.

The following table lists the maximum number of nodes and the maximum storage space per node for different node specifications.

Specifications	Maximum number of nodes	Maximum storage space per node
Specifications	Maximum number of nodes	General scenario	Query scenario	Logging scenario
2 vCPUs and 4 GiB of memory	10	120 GiB	40 GiB	200 GiB
2 vCPUs and 8 GiB of memory	10	240 GiB	80 GiB	400 GiB
4 vCPUs and 16 GiB of memory	20	480 GiB	160 GiB	800 GiB
8 vCPUs and 32 GiB of memory	40	960 GiB	320 GiB	1.5 TiB
16 vCPUs and 64 GiB of memory	80	1.9 TiB	640 GiB	3 TiB

The total storage space of an Elasticsearch cluster is calculated by using the following formula: Total storage space of an Elasticsearch cluster = Storage space per node × Number of nodes. You can determine the specifications of each node based on the maximum storage space of each node and the maximum number of nodes.

For aggregated queries, we recommend that you select specifications with a vCPU-to-memory ratio of 1:2 for data nodes and enable client nodes.
If you enable client nodes, we recommend that you configure client nodes and data nodes based on the 1:5 ratio and select specifications with a vCPU-to-memory ratio of 1:4 or 1:8 for the client nodes. You must purchase at least two client nodes. For example, if you configure 10 data nodes whose specifications are 8 vCPUs and 32 GiB of memory, we recommend that you configure 2 client nodes whose specifications are 8 vCPUs and 32 GiB of memory.

Note

If you use independent client nodes, you can perform a reduce operation on the evaluation result. In this case, if severe garbage collection (GC) occurs in the reduce stage, data nodes cannot be affected.
The number of data nodes affects the total number of shards. Before you determine node specifications, you must also perform a shard evaluation.

Shard evaluation

The number of shards and the size of each shard affect the stability and performance of an Elasticsearch cluster. You must properly plan shards for all indexes in an Elasticsearch cluster. This helps prevent numerous shards from affecting cluster performance or causing uneven loads in complex business scenarios. For example, if shard planning for an index in an Elasticsearch cluster is inappropriate, significant differences in disk usage of nodes and uneven CPU loads among nodes may occur.

Note

Shards are the distributed storage units of indexes in an Elasticsearch cluster. Shards are classified into primary shards and replica shards. For more information, see shard and replica shard.

Before you plan shards, take note of the following items:

Volume of data stored on each index
Whether the volume keeps increasing
Node specifications
Whether to delete or merge temporary indexes on a regular basis

Alibaba Cloud provides the following guidelines for you to plan shards. These guidelines are for reference only.

Volume of data stored on each shard
- We recommend that you store no more than 30 GiB of data on each shard. In special cases, you can store no more than 50 GiB of data on each shard.
- In log analytics scenarios or scenarios that require extremely large indexes, make sure that each shard stores no more than 100 GiB of data.
Number of shards
- Before you allocate shards for your Elasticsearch cluster, we recommend that you evaluate the volume of data that you want to store.
  - If the total data volume is large, you must reduce the amount of data to write to reduce the workloads of your Elasticsearch cluster. In this case, we recommend that you configure multiple primary shards for each index and one replica shard for each primary shard.
  - If both the total data volume and the volume of data that you want to write are small, we recommend that you configure one primary shard for each index and one or more replica shards for each primary shard.
  Note
  - By default, an Elasticsearch cluster of V7.X or later is configured with one primary shard for each index and one replica shard for each primary shard. By default, an Elasticsearch cluster earlier than V7.X is configured with five primary shards for each index and one replica shard for each primary shard.
  - If the volume of data that you need to store is less than 30 GiB, you can configure one primary shard for each index and multiple replica shards for the primary shard. This achieves load balancing. For example, the size of each index is 20 GiB, and your Elasticsearch cluster contains five data nodes. In this case, you can configure one primary shard for each index and four replica shards for each primary shard.
- We recommend that you keep the number of shards the same as the number of data nodes or an integral multiple of the number of data nodes.
- We recommend that you configure a maximum of five shards for an index on a node.
- We recommend that you calculate the total number of shards for all indexes on a single node by using one of the following formulas:
  - For clusters with small specifications: Number of shards on a single data node = Memory size of the data node × 30
  - For clusters with large specifications: Number of shards on a single data node = Memory size of the data node × 50
  Note
  - When you calculate the number of shards, you must also take data volume into account. If the data volume is less than 1 TiB, we recommend that you calculate the number of shards by using the formula for clusters with small specifications.
  - By default, the maximum number of shards on a single node in an Elasticsearch V7.X cluster is 1,000. We recommend that you do not change the maximum number. If you want to change the number of shards on a single node, you can increase the number of nodes before you use the cluster.
  - We recommend that you configure shards based on your business requirements. More primary shards generate more performance overheads. If you configure an excessively large number of shards for each index in your Elasticsearch cluster, file handles may be exhausted. As a result, faults may occur on your Elasticsearch cluster.

For more information about shard evaluation, see How to size your shards.

References

You can purchase an Elasticsearch cluster or view the node specifications that are supported in different regions and Elasticsearch versions on the buy page of Elasticsearch.
You can refer to the results of the stress test on Elasticsearch clusters with different specifications and of different versions to learn about the performance of different node specifications. For more information, see topics in the Performance directory.
For information about the differences between the Standard Edition and Kernel-enhanced Edition cluster types, and the feature changes in each cluster version, see Version features.
You can adjust items such as the node specifications, node storage space, and number of nodes for an existing Elasticsearch cluster based on the evaluation result. For information about how to perform the operations and the related precautions, see Upgrade the configuration of a cluster and Downgrade the configuration of a cluster.
You can specify the number of primary shards for an index only when you create the index. After the index is created, you cannot change the number for the index. For information about how to create an index, see Create an index.
If the volume of data stored on each shard for an existing index exceeds the recommended volume, we recommend that you reindex data for the index. For more information, see Use the reindex API to migrate data.
Note
Data reindexing can ensure service continuity but is time-consuming.
For information about how to resolve unbalanced loads on an Elasticsearch cluster, see Unbalanced loads on a cluster.
For information about how to resolve the uneven distribution of hot data on nodes, see Uneven distribution of hot data on nodes.

If you enable the Auto Indexing feature, we recommend that you use the index lifecycle management (ILM) feature or an Elasticsearch API script to delete outdated indexes. For more information about the ILM feature, see Use ILM to manage Heartbeat indexes.
We recommend that you delete small indexes in a timely manner to free up heap memory.