LindormSearch Cluster Capacity Planning and Storage Estimation - Lindorm

LindormSearch is a distributed search engine service built on a multi-node cluster. Before purchasing, estimate the storage, node, and shard configuration your workload requires.

Estimate storage capacity

The following factors determine how much storage your cluster needs:

Factor	Value	Explanation
Replicas	0 (default) or 1	LindormSearch uses distributed shared storage. If a node fails, data automatically migrates to other nodes. Set replicas to 1 only when your workload requires high reliability.
Index bloat	~20%	Indexes typically expand source data by 20%.
Internal overhead	20%	Reserved for transaction log recording and compaction.
OS reserve	5%	Reserved by the OS by default.
Safety buffer	20%	Keeps the cluster stable. When storage usage reaches 80%, an alert is sent automatically.

These factors combine into a single formula:

Required storage = source data size × 1.9

Choose node count and specifications

Deploy a minimum of two nodes to eliminate single points of failure.
Start with high-specification nodes (16 CPU cores, 64 GB memory) rather than adding more low-specification nodes. Fewer, larger nodes improve both performance and cluster stability.
After running a load test with real data, confirm whether the nodes meet your requirements.
If the test shows insufficient compute capacity, scale up first — for example, upgrade nodes from 4 CPU cores and 16 GB memory to 8 CPU cores and 32 GB memory. Only consider scaling out (adding nodes) after reaching the limits of vertical scaling.

Configure shards

Each index is divided into shards. When data is written, LindormSearch uses a hash algorithm to distribute documents across shards based on document IDs.

Keep each shard between 20 GB and 50 GB.

Set the total shard count to an integer multiple of the number of nodes. For example, with two nodes, set the shard count to 2.

Manage time-series data

For data with a time attribute — such as log data or order data — use the alias feature. The alias continuously generates new indexes and periodically deletes old ones, keeping storage predictable and bounded. For details, see Use sharding (aliases).