Before you use Alibaba Cloud Elasticsearch, you must estimate the total amount of required resources. Based on testing results and user feedback, Alibaba Cloud offers some common methods to estimate and calculate the amount of Elasticsearch resources. These methods are for reference only.

Applicable disk types

This best practice can be applied to Alibaba Cloud Elasticsearch instances that have their Disk Type set to SSD Cloud Disk.SSD cloud disk

Disk sizing

The disk capacity of an Alibaba Cloud Elasticsearch instance is determined by the following factors:

  • Number of replicas. Each index must have a minimum of one replica.
  • Indexing overheads. Typically, the indexing overheads are 10% larger than those of the source data. The indexing overheads of the _all parameter are not included.
  • Operating system reserved space. The operating system reserves 5% of the disk space for critical processes, system recovery, and disk fragments by default.
  • Alibaba Cloud Elasticsearch overheads. Alibaba Cloud Elasticsearch reserves 20% of the disk space for segment merges, logs, and other internal operations.
  • Security threshold overheads. A minimum of 15% of the disk space must be reserved as the security threshold.

Based on these factors, the minimum disk space required is calculated as: Minimum disk space = Size of source data × 3.4

Total disk space = Size of source data × (1 + Number of replicas) × (1 + Indexing overheads)/(1 - Operating system reserved space)/(1 - Elasticsearch overheads)/(1 - Security threshold overheads)
= Size of source data × (1 + Number of replicas) × 1.7
= Size of source data × 3.4
Notice
  • We recommend that you do not enable the _all parameter unless it is required by your business.
  • Indexes that have this parameter enabled incur larger overheads on disk utilization. Based on testing results and our practices, we recommend that you add extra 50% of the estimated space to the final amount of disk space.
    Total disk space = Size of source data × (1 + Number of replicas) × 1.7 × (1 + 0.5)
    = Size of source data × 5.1

Choose cluster specification

The performance of an Alibaba Cloud Elasticsearch cluster is determined by the specifications of the Elasticsearch nodes in the cluster. Before you use Elasticsearch, we recommend that you estimate the size of the cluster, and then add nodes or upgrade the cluster. Based on our testing results and practices, we provide the following suggestions:
  • Maximum number of nodes per cluster = Number of CPU cores per node × 5
  • The maximum amount of data that an Elasticsearch node can store varies depending on different scenarios.
    • Acceleration and aggregation on data queries: Maximum amount of data per node = Memory per node (GB) × 10
    • Log data importing and offline analytics: Maximum amount of data per node = Memory per node (GB) × 50
    • In most scenarios: Maximum amount of data per node = Memory per node (GB) × 30
Table 1. Recommended cluster specifications
Node type Maximum number of cluster nodes Maximum disk space per node (query) Maximum disk space per node (log) Maximum disk space per node (common)
2-core 4 GB 10 40 GB 200 GB 100 GB
2-core 8 GB 10 80 GB 400 GB 200 GB
4-core 16 GB 20 160 GB 800 GB 512 GB
8-core 32 GB 40 320 GB 1.5 TB 1 TB
16-core 64 GB 50 640 GB 2 TB 2 TB

Shard sizing

Both the number of shards and the size of each shard contribute to the stability and performance of an Alibaba Cloud Elasticsearch cluster. Each index in Elasticsearch is split into a certain number of shards. By default, an index is split into five shards.

  • For small Elasticsearch nodes, we recommend that the size of each shard is smaller than or equal to 30 GB. For large Elasticsearch nodes, we recommend that the size of each shard is smaller than or equal to 50 GB.
  • For log analysis or extremely large indexes, we recommend that the size of each shard is smaller than or equal to 100 GB.
  • The number of shards, including replicas, must be equal to the number of nodes or equal to a multiple of the number of nodes.
  • We recommend that you specify a maximum of five shards for an index on a node.
Note
  • Different users may have different requirements on data schema, query complexity, data size, performance, and data changes. This topic only provides a reference for Elasticsearch sizing.
  • We recommend that you measure the size of your Elasticsearch cluster based on the actual data and service scenarios if possible.
  • When you need to deal with heavy workloads, you can use the elastic scaling feature of Alibaba Cloud Elasticsearch to expand disks, add nodes, or upgrade nodes.