Before you use Alibaba Cloud Elasticsearch, you must evaluate the total amount of required resources, such as the disk space, node specifications, number of shards, and size of each shard. Based on test results and user feedback, Alibaba Cloud offers some common methods for evaluation.
Supported disk types
This topic is suitable for Elasticsearch clusters that use standard SSDs.
Disk space evaluation
- Number of replicas: Each shard must have at least one replica.
- Indexing overheads: In most cases, indexing overheads are 10% greater than those of
source data. The overheads of the
_allparameter are not included.
- Space reserved by the Linux operating system: By default, the Linux operating system reserves 5% of the disk space for critical processes, system recovery, and disk fragments.
- Elasticsearch overheads: Elasticsearch reserves 20% of the disk space for internal operations such as segment merging and logging.
- Security threshold overheads: Elasticsearch reserves at least 15% of the disk space as the security threshold.
Based on these factors, the minimum required disk space is calculated as follows:
Minimum required disk space = Volume of source data × (1 + Number of replicas) × (1 + Indexing overheads)/(1 - Linux reserved space)/(1 - Elasticsearch overheads)/(1 - Security threshold overheads) = Volume of source data × (1 + Number of replicas) × 1.7 = Volume of source data × 3.4
- We recommend that you do not enable the
_allparameter unless it is required by your business.
- Indexes that have the
_allparameter enabled incur larger overheads on disk usage. Based on test results and user feedback, we recommend that you calculate the disk space of an Elasticsearch cluster as follows:
Minimum required disk space = Volume of source data × (1 + Number of replicas) × 1.7 × (1 + 0.5) = Volume of source data × 5.1
Node specification evaluation
- Maximum number of nodes per cluster:
Maximum number of nodes per cluster = Number of vCPUs per node × 5
- Maximum volume of data per node:
The maximum volume of data that a node in an Elasticsearch cluster can store depends on the scenario. Examples:
- Acceleration or aggregation on data queries:
Maximum volume of data per node = Memory per node (GiB) × 10
- Log data importing or offline analytics:
Maximum volume of data per node = Memory per node (GiB) × 50
- In general scenarios:
Maximum volume of data per node = Memory per node (GiB) × 30
- Acceleration or aggregation on data queries:
The following table lists some node specifications.
|Specification||Maximum number of nodes||Maximum disk space per node in query scenarios||Maximum disk space per node in logging scenarios||Maximum disk space per node in general scenarios|
|2 vCPUs and 4 GiB of memory||10||40 GiB||200 GiB||100 GiB|
|2 vCPUs and 8 GiB of memory||10||80 GiB||400 GiB||200 GiB|
|4 vCPUs and 16 GiB of memory||20||160 GiB||800 GiB||512 GiB|
|8 vCPUs and 32 GiB of memory||40||320 GiB||1.5 TiB||1 TiB|
|16 vCPUs and 64 GiB of memory||50||640 GiB||2 TiB||2 TiB|
- Volume of data stored on each index
- Whether the volume will increase
- Node specifications
- Whether you will delete or merge temporary indexes on a regular basis
Based on the preceding items, Alibaba Cloud provides the following guidelines for planning shards. These guidelines are for reference only.
- Before shard allocation, evaluate the volume of data that you want to store. If the total data volume is large, write a small amount of data to reduce the workloads of your Elasticsearch cluster. In this case, configure multiple shards for each index and one replica for each shard. If both the total data volume and the volume of data that you want to write are small, configure one shard for each index and one or more replicas for each shard.
- Ensure that the size of each shard is no more than 30 GiB. In certain cases, the size
can be up to 50 GiB. If the evaluation result exceeds the limit, properly plan shards
before creating indexes and then perform a reindex operation in the future. This operation
ensures the normal running of your Elasticsearch cluster but is time-consuming.
Note If the evaluated data volume is less than 30 GiB, you can configure one shard and multiple replicas for each index to implement load balancing. For example, the size of each index is 20 GiB and your Elasticsearch cluster has five data nodes. In this case, you can configure one shard and four replicas for each index.
- For log analysis or extremely large indexes, ensure that the size of each shard is no more than 100 GiB.
- Make sure that the total number of shards and replicas is the same as or a multiple
of the number of data nodes.
Note The more shards, the more performance overheads of your Elasticsearch cluster.
- Configure a maximum of five shards for an index on a node.
- Configure a maximum of 100 shards for all indexes on a node to improve the performance of your Elasticsearch cluster.
- Add at least two independent client nodes. The ratio of client nodes to data nodes
must be 1:5, and the vCPU-to-memory ratio of each client node must be 1:4 or 1:8.
For example, your Elasticsearch cluster contains 10 data nodes and each data node
offers 8 vCPUs and 32 GiB of memory. In this case, you can configure two independent
client nodes, each of which offers 8 vCPUs and 32 GiB of memory.
Note After you use independent client nodes, you can perform a reduce operation on the evaluation result. In this case, if severe garbage collection (GC) occurs in the reduce stage, data nodes are not affected.
- If the auto indexing feature is enabled, enable index lifecycle management or call an Elasticsearch API operation to delete automatically created indexes.
- Delete small indexes in a timely manner. These indexes also occupy heap memory.
Different users may have different requirements on data schemas, query complexity, data sizes, performance, and data changes. This topic is used for reference only. We recommend that you measure the specifications and storage capacity for your Elasticsearch cluster based on actual data and business scenarios if possible.
You can reference this topic to choose the specifications and storage capacity of your Elasticsearch cluster. If you want to deal with heavy workloads, you can use the elastic scaling feature of Elasticsearch to resize disks, add nodes, or upgrade nodes.