Plan storage capacity, node specifications, and shard layout for an Alibaba Cloud Elasticsearch cluster before purchase or configuration changes.
Quick reference: With one replica per primary shard, provision approximately 3.4x your source data volume in total storage. For example, 100 GiB of source data requires about 340 GiB of cluster storage.
The evaluation methods in this document are based on real-world test results and operational experience. Actual requirements may differ depending on data structure, query complexity, data volume, data changes, and performance goals. Validate estimates with representative workloads before finalizing your configuration.
Storage capacity formula
With the default of one replica shard per primary shard, total cluster storage is roughly 3.4x the source data volume. This multiplier accounts for the following overhead factors:
| Factor | Overhead | Description |
|---|---|---|
| Replica shards | 2x (with 1 replica) | Each primary shard has at least one replica shard |
| Indexing overhead | Typically 10% | Space consumed by index structures beyond source data |
| Internal overhead | 20% reserved | Segment merging, logging, and other internal operations |
| OS reserved space | 5% reserved | Critical processes, system recovery, and disk fragments |
| Security threshold | At least 15% reserved | Minimum free space maintained by Elasticsearch |
Simplified formula:
Cluster storage = Source data x (1 + Number of replicas) x 1.7
= Source data x 3.4 (when replicas = 1)Full formula:
Cluster storage = Source data
x (1 + Number of replicas)
x Indexing overhead factor
/ (1 - OS reserved)
/ (1 - Internal overhead)
/ (1 - Security threshold)
= Source data x (1 + Number of replicas) x 1.1 / 0.95 / 0.80 / 0.85
= Source data x (1 + Number of replicas) x 1.7The 3.4x multiplier assumes one replica. Adjust the formula with your actual replica count.
Worked example: 200 GiB source data
Scenario: 200 GiB of source data, one replica shard per primary shard.
Cluster storage = 200 GiB x (1 + 1) x 1.7
= 200 x 2 x 1.7
= 680 GiBStorage consumed outside the formula
Beyond the factors in the formula, these items also consume storage:
X-Pack monitoring indexes -- Used for exception analysis:
.monitoring-es-6-*: Consumes significant storage. Retains the last 7 days by default..monitoring-kibana-6-*: Grows with the number of indexes. Retains the last 7 days by default..watcher-history-3-*: Consumes minimal storage. Delete manually when no longer needed.
Cluster logs -- Include run logs, access logs, and slow logs. Retained for the last 7 days by default. This retention period cannot be changed. Log volume increases with the number of queries and data pushes the cluster receives.
Node specifications and count
Data nodes
Two rules determine the maximum scale per data node:
Maximum nodes per cluster = vCPUs per node x 5
Maximum storage per node = Memory per node (GiB) x a scenario-specific multiplier
| Scenario | Multiplier | Typical use |
|---|---|---|
| General | Memory x 30 | Mixed read/write workloads |
| Query | Memory x 10 | Acceleration, aggregation |
| Logging | Memory x 50 | Log import, offline analytics |
The following table shows the maximum node count and maximum storage per node for each specification:
| Specification | Max nodes | General | Query | Logging |
|---|---|---|---|---|
| 2 vCPUs, 4 GiB | 10 | 120 GiB | 40 GiB | 200 GiB |
| 2 vCPUs, 8 GiB | 10 | 240 GiB | 80 GiB | 400 GiB |
| 4 vCPUs, 16 GiB | 20 | 480 GiB | 160 GiB | 800 GiB |
| 8 vCPUs, 32 GiB | 40 | 960 GiB | 320 GiB | 1.5 TiB |
| 16 vCPUs, 64 GiB | 80 | 1.9 TiB | 640 GiB | 3 TiB |
Total cluster storage = Storage per node x Number of nodes
Select node specifications based on the maximum storage per node and the maximum number of nodes for your target specification.
The number of data nodes affects the total shard count. Complete the shard evaluation below before finalizing node specifications.
For aggregation-heavy queries, select specifications with a 1:2 vCPU-to-memory ratio and enable client nodes.
Worked example: 2 TiB log data
Scenario: 2 TiB of log data, logging use case, one replica.
Calculate required storage: 2 TiB x 3.4 = 6.8 TiB
Select node specification: 8 vCPUs, 32 GiB (logging max: 1.5 TiB per node)
Calculate node count: 6.8 TiB / 1.5 TiB = ~5 nodes (round up)
Verify node limit: 5 nodes < 40 max nodes. Valid.
Dedicated master nodes
Enable dedicated master nodes for clusters with many data nodes to maintain cluster stability. Select the specification based on your data node count:
| Data node count | Dedicated master node specification |
|---|---|
| Default | 2 vCPUs, 8 GiB |
| More than 10 | 4 vCPUs, 16 GiB |
| More than 30 | 8 vCPUs, 32 GiB |
| More than 50 | 16 vCPUs, 64 GiB |
If the cluster has many indexes and shards, or data changes frequently, select higher specifications for dedicated master nodes.
Client nodes
Client nodes (coordinating node in Elasticsearch) handle the reduce phase of distributed queries. Dedicated client nodes isolate garbage collection (GC) impact from data nodes.
| Guideline | Value |
|---|---|
| Client-to-data node ratio | 1:5 |
| Client node vCPU-to-memory ratio | 1:4 or 1:8 |
| Minimum client nodes | 2 |
Example: For 10 data nodes at 8 vCPUs, 32 GiB each, configure 2 client nodes at 8 vCPUs, 32 GiB each.
Shard evaluation
Shards are the basic storage units of Elasticsearch indexes, classified into primary shards and replica shards. For more information, see Shard and replica shard.
Proper shard planning prevents performance degradation, uneven disk usage, and imbalanced CPU loads across nodes. Plan shards based on data volume per index, expected data growth, node specifications, and whether temporary indexes need regular deletion or merging.
Shard size guidelines
| Scenario | Maximum shard size |
|---|---|
| General workloads | 30 GiB (up to 50 GiB in special cases) |
| Log analytics or very large indexes | 100 GiB |
Number of shards per index
Determine the shard count based on data volume:
Large data volume, high write throughput: Configure multiple primary shards per index with one replica per primary shard.
Small data volume, low write throughput: Configure one primary shard per index with one or more replica shards.
Default shard configuration varies by version:
V7.X and later: 1 primary shard, 1 replica shard per index
Earlier than V7.X: 5 primary shards, 1 replica shard per index
Load balancing with small indexes: If the data volume per index is less than 30 GiB, use one primary shard with multiple replicas to distribute load across nodes. For example, a 20 GiB index on a 5-node cluster can use 1 primary shard and 4 replica shards.
Shard distribution guidelines
Keep the total shard count equal to the data node count, or an integer multiple of it.
Place a maximum of 5 shards per index on a single node.
Total shards per node
Calculate the maximum number of shards a single data node can hold:
| Cluster size | Formula |
|---|---|
| Small specifications (or data volume < 1 TiB) | Shards per node = Memory (GiB) x 30 |
| Large specifications | Shards per node = Memory (GiB) x 50 |
The default maximum shard count per node in V7.X clusters is 1,000. Do not change this limit. If more shards are needed, add more nodes instead.
Excessive shards increase performance overhead and may exhaust file handles, leading to cluster faults. Configure shards based on actual business requirements.
For more guidance, see How to size your shards.
Sizing and maintenance best practices
Start with estimates, then iterate. The formulas in this document provide initial sizing estimates. Validate with representative workloads and adjust as needed.
Delete outdated indexes. If Auto Indexing is enabled, use index lifecycle management (ILM) or an Elasticsearch API script to remove outdated indexes. For details, see Use ILM to manage Heartbeat indexes.
Free heap memory. Delete small or unused indexes promptly to free heap memory.
Monitor shard health. If the data volume per shard on an existing index exceeds the recommended limit, reindex the data. For details, see Use the reindex API to migrate data. Data reindexing maintains service continuity but is time-consuming.
References
Buy page: View supported node specifications by region and Elasticsearch version.
Performance: Stress test results for clusters of different specifications and versions.
Version features: Differences between Standard Edition and Kernel-enhanced Edition, and feature changes across versions.
Upgrade the configuration of a cluster: Adjust node specifications, storage, and node count.
Downgrade the configuration of a cluster: Reduce cluster configuration.
Create an index: The number of primary shards can only be set when an index is created and cannot be changed afterward.
Unbalanced loads on a cluster: Troubleshoot load imbalance.
Uneven distribution of hot data on nodes: Resolve hot data distribution issues.