Evaluate cluster resource and sizing requirements - Elasticsearch

Plan storage capacity, node specifications, and shard layout for an Alibaba Cloud Elasticsearch cluster before purchase or configuration changes.

Quick reference: With one replica per primary shard, provision approximately 3.4x your source data volume in total storage. For example, 100 GiB of source data requires about 340 GiB of cluster storage.

The evaluation methods in this document are based on real-world test results and operational experience. Actual requirements may differ depending on data structure, query complexity, data volume, data changes, and performance goals. Validate estimates with representative workloads before finalizing your configuration.

Storage capacity formula

With the default of one replica shard per primary shard, total cluster storage is roughly 3.4x the source data volume. This multiplier accounts for the following overhead factors:

Factor	Overhead	Description
Replica shards	2x (with 1 replica)	Each primary shard has at least one replica shard
Indexing overhead	Typically 10%	Space consumed by index structures beyond source data
Internal overhead	20% reserved	Segment merging, logging, and other internal operations
OS reserved space	5% reserved	Critical processes, system recovery, and disk fragments
Security threshold	At least 15% reserved	Minimum free space maintained by Elasticsearch

Simplified formula:

Cluster storage = Source data x (1 + Number of replicas) x 1.7
                = Source data x 3.4  (when replicas = 1)

Full formula:

Cluster storage = Source data
                  x (1 + Number of replicas)
                  x Indexing overhead factor
                  / (1 - OS reserved)
                  / (1 - Internal overhead)
                  / (1 - Security threshold)

                = Source data x (1 + Number of replicas) x 1.1 / 0.95 / 0.80 / 0.85
                = Source data x (1 + Number of replicas) x 1.7

Important

The 3.4x multiplier assumes one replica. Adjust the formula with your actual replica count.

Worked example: 200 GiB source data

Scenario: 200 GiB of source data, one replica shard per primary shard.

Cluster storage = 200 GiB x (1 + 1) x 1.7
                = 200 x 2 x 1.7
                = 680 GiB

Storage consumed outside the formula

Beyond the factors in the formula, these items also consume storage:

X-Pack monitoring indexes -- Used for exception analysis:
- .monitoring-es-6-*: Consumes significant storage. Retains the last 7 days by default.
- .monitoring-kibana-6-*: Grows with the number of indexes. Retains the last 7 days by default.
- .watcher-history-3-*: Consumes minimal storage. Delete manually when no longer needed.
Cluster logs -- Include run logs, access logs, and slow logs. Retained for the last 7 days by default. This retention period cannot be changed. Log volume increases with the number of queries and data pushes the cluster receives.

Node specifications and count

Data nodes

Two rules determine the maximum scale per data node:

Maximum nodes per cluster = vCPUs per node x 5
Maximum storage per node = Memory per node (GiB) x a scenario-specific multiplier

Scenario	Multiplier	Typical use
General	Memory x 30	Mixed read/write workloads
Query	Memory x 10	Acceleration, aggregation
Logging	Memory x 50	Log import, offline analytics

The following table shows the maximum node count and maximum storage per node for each specification:

Specification	Max nodes	General	Query	Logging
2 vCPUs, 4 GiB	10	120 GiB	40 GiB	200 GiB
2 vCPUs, 8 GiB	10	240 GiB	80 GiB	400 GiB
4 vCPUs, 16 GiB	20	480 GiB	160 GiB	800 GiB
8 vCPUs, 32 GiB	40	960 GiB	320 GiB	1.5 TiB
16 vCPUs, 64 GiB	80	1.9 TiB	640 GiB	3 TiB

Total cluster storage = Storage per node x Number of nodes

Select node specifications based on the maximum storage per node and the maximum number of nodes for your target specification.

The number of data nodes affects the total shard count. Complete the shard evaluation below before finalizing node specifications.
For aggregation-heavy queries, select specifications with a 1:2 vCPU-to-memory ratio and enable client nodes.

Worked example: 2 TiB log data

Scenario: 2 TiB of log data, logging use case, one replica.

Calculate required storage: 2 TiB x 3.4 = 6.8 TiB
Select node specification: 8 vCPUs, 32 GiB (logging max: 1.5 TiB per node)
Calculate node count: 6.8 TiB / 1.5 TiB = ~5 nodes (round up)
Verify node limit: 5 nodes < 40 max nodes. Valid.

Dedicated master nodes

Enable dedicated master nodes for clusters with many data nodes to maintain cluster stability. Select the specification based on your data node count:

Data node count	Dedicated master node specification
Default	2 vCPUs, 8 GiB
More than 10	4 vCPUs, 16 GiB
More than 30	8 vCPUs, 32 GiB
More than 50	16 vCPUs, 64 GiB

If the cluster has many indexes and shards, or data changes frequently, select higher specifications for dedicated master nodes.

Client nodes

Client nodes (coordinating node in Elasticsearch) handle the reduce phase of distributed queries. Dedicated client nodes isolate garbage collection (GC) impact from data nodes.

Guideline	Value
Client-to-data node ratio	1:5
Client node vCPU-to-memory ratio	1:4 or 1:8
Minimum client nodes	2

Example: For 10 data nodes at 8 vCPUs, 32 GiB each, configure 2 client nodes at 8 vCPUs, 32 GiB each.

Shard evaluation

Shards are the basic storage units of Elasticsearch indexes, classified into primary shards and replica shards. For more information, see Shard and replica shard.

Proper shard planning prevents performance degradation, uneven disk usage, and imbalanced CPU loads across nodes. Plan shards based on data volume per index, expected data growth, node specifications, and whether temporary indexes need regular deletion or merging.

Shard size guidelines

Scenario	Maximum shard size
General workloads	30 GiB (up to 50 GiB in special cases)
Log analytics or very large indexes	100 GiB

Number of shards per index

Determine the shard count based on data volume:

Large data volume, high write throughput: Configure multiple primary shards per index with one replica per primary shard.
Small data volume, low write throughput: Configure one primary shard per index with one or more replica shards.

Default shard configuration varies by version:
V7.X and later: 1 primary shard, 1 replica shard per index
Earlier than V7.X: 5 primary shards, 1 replica shard per index

Load balancing with small indexes: If the data volume per index is less than 30 GiB, use one primary shard with multiple replicas to distribute load across nodes. For example, a 20 GiB index on a 5-node cluster can use 1 primary shard and 4 replica shards.

Shard distribution guidelines

Keep the total shard count equal to the data node count, or an integer multiple of it.
Place a maximum of 5 shards per index on a single node.

Total shards per node

Calculate the maximum number of shards a single data node can hold:

Cluster size	Formula
Small specifications (or data volume < 1 TiB)	Shards per node = Memory (GiB) x 30
Large specifications	Shards per node = Memory (GiB) x 50

The default maximum shard count per node in V7.X clusters is 1,000. Do not change this limit. If more shards are needed, add more nodes instead.
Excessive shards increase performance overhead and may exhaust file handles, leading to cluster faults. Configure shards based on actual business requirements.

For more guidance, see How to size your shards.

Sizing and maintenance best practices

Start with estimates, then iterate. The formulas in this document provide initial sizing estimates. Validate with representative workloads and adjust as needed.
Delete outdated indexes. If Auto Indexing is enabled, use index lifecycle management (ILM) or an Elasticsearch API script to remove outdated indexes. For details, see Use ILM to manage Heartbeat indexes.
Free heap memory. Delete small or unused indexes promptly to free heap memory.
Monitor shard health. If the data volume per shard on an existing index exceeds the recommended limit, reindex the data. For details, see Use the reindex API to migrate data. Data reindexing maintains service continuity but is time-consuming.

References

Buy page: View supported node specifications by region and Elasticsearch version.
Performance: Stress test results for clusters of different specifications and versions.
Version features: Differences between Standard Edition and Kernel-enhanced Edition, and feature changes across versions.
Upgrade the configuration of a cluster: Adjust node specifications, storage, and node count.
Downgrade the configuration of a cluster: Reduce cluster configuration.
Create an index: The number of primary shards can only be set when an index is created and cannot be changed afterward.
Unbalanced loads on a cluster: Troubleshoot load imbalance.
Uneven distribution of hot data on nodes: Resolve hot data distribution issues.