All Products
Search
Document Center

Elasticsearch:Overview

Last Updated:Aug 23, 2023

This topic describes a stress test performed on Alibaba Cloud Elasticsearch V5.5.3 clusters that have different specifications and reside in the China (Hangzhou) region. The test is performed by using a Rally script that is provided by open source Elasticsearch for benchmarking Elasticsearch clusters. This topic also describes the metrics and the operation parameter used in the stress test.

Overview

Rally is a stress test tool provided by open source Elasticsearch. In this example, Rally is used to perform a stress test on the Alibaba Cloud Elasticsearch clusters that have different specifications. You can view the stress test results in the following topics:

The stress test result of an Elasticsearch cluster with three 4-vCPU 16-GiB data nodes and that of an Elasticsearch cluster with three 2-vCPU 8-GiB data nodes are compared. For more information, see Comparison of stress testing results between an Elasticsearch cluster with three 4-vCPU 16-GiB data nodes and an Elasticsearch cluster with three 2-vCPU 8-GiB data nodes.

You can refer to the Metrics used in the stress test and Description of the operation parameter sections in this topic to have a good command of the metrics and the operation parameter used in the stress test.

Metrics used in the stress test

Before you perform a stress test on an Elasticsearch cluster, you can refer to the following table to understand the related metrics.
Note The following table describes only some important metrics for your reference. You can infer the meanings of other metrics based on the metrics described in the following table. For more information about other metrics, see the documentation for metrics for a stress test by using Rally.
Metric typeMetric nameDescription
Metrics related to indexing of primary shardsCumulative indexing time of primary shardsThe cumulative time used for indexing of all primary shards.
Note The time is not wall-clock time. It is the sum of the CPU time consumed by multiple threads used for indexing. For example, M threads are used for indexing, and each thread runs for N minutes. In this case, the time collected by this metric is calculated by using the following formula: M × N (unit: minutes).
Min cumulative indexing time across primary shardsThe minimum cumulative time used for indexing across primary shards.
Median cumulative indexing time across primary shardsThe average cumulative time used for indexing across primary shards.
Max cumulative indexing time across primary shardsThe maximum cumulative time used for indexing across primary shards.
Cumulative indexing throttle time of primary shardsThe cumulative time that indexing of all primary shards is throttled.
Note The time is not wall-clock time. It is the sum of the CPU time consumed by multiple threads used for indexing when indexing is throttled.
Min cumulative indexing throttle time across primary shardsThe minimum cumulative time that indexing across primary shards is throttled.
Median cumulative indexing throttle time across primary shardsThe average cumulative time that indexing across primary shards is throttled.
Max cumulative indexing throttle time across primary shardsThe maximum cumulative time that indexing across primary shards is throttled.
Cumulative merge time of primary shardsThe cumulative runtime used for merge operations for primary shards. The time also indicates the sum of the CPU time consumed by all threads.
Cumulative merge count of primary shardsThe cumulative number of merges of primary shards.
Note Some primary shards may not be merged.
Min cumulative merge time across primary shardsThe minimum cumulative time used for merge operations across primary shards.
Median cumulative merge time across primary shardsThe average cumulative time used for merge operations across primary shards.
Max cumulative merge time across primary shardsThe maximum cumulative time used for merge operations across primary shards.
Cumulative merge throttle time of primary shardsThe cumulative time that merge operations for primary shards are throttled. The time also indicates the sum of the CPU time consumed by all threads.
Min cumulative merge throttle time across primary shardsThe minimum cumulative time that merge operations across primary shards are throttled. The time also indicates the sum of the CPU time consumed by all threads.
Median cumulative merge throttle time across primary shardsThe average cumulative time that merge operations across primary shards are throttled. The time also indicates the sum of the CPU time consumed by all threads.
Max cumulative merge throttle time across primary shardsThe maximum cumulative time that merge operations across primary shards are throttled. The time also indicates the sum of the CPU time consumed by all threads.
Cumulative refresh time of primary shardsThe cumulative time used for index refresh of primary shards. The time also indicates the CPU time consumed by all threads.
Cumulative refresh count of primary shardsThe cumulative number of refreshes of primary shards.
Min cumulative refresh time across primary shardsThe minimum cumulative time used for index refresh across primary shards.
Median cumulative refresh time across primary shardsThe average cumulative time used for index refresh across primary shards.
Max cumulative refresh time across primary shardsThe maximum cumulative time used for index refresh across primary shards.
Cumulative flush time of primary shardsThe cumulative time used for flushing transactional data of indexing of primary shards from the cache to a disk. The time also indicates the sum of the CPU time consumed by all threads.
Cumulative flush count of primary shardsThe cumulative number of flushes for transactional data of indexing of primary shards from the cache to a disk.
Min cumulative flush time across primary shardsThe minimum cumulative time used for flushing transactional data of indexing across primary shards from the cache to a disk. The time also indicates the sum of the CPU time consumed by all threads.
Median cumulative flush time across primary shardsThe average cumulative time used for flushing transactional data of indexing across primary shards from the cache to a disk. The time also indicates the sum of the CPU time consumed by all threads.
Max cumulative flush time across primary shardsThe maximum cumulative time used for flushing transactional data of indexing across primary shards from the cache to a disk. The time also indicates the sum of the CPU time consumed by all threads.
Store sizeThe size of data stored in indexes. The size does not include the size of translogs and that of data stored in replica shards.
Translog sizeThe size of translogs.
Heap used for segmentsThe size of heap memory occupied by the segments of all primary shards.
Heap used for doc valuesThe size of heap memory occupied by the documents in indexes of all primary shards.
Heap used for termsThe size of heap memory occupied by terms factors of indexes of all primary shards.
Heap used for normsThe size of heap memory occupied by norms factors of indexes of all primary shards.
Heap used for pointsThe size of heap memory occupied by points of indexes of all primary shards.
Heap used for stored fieldsThe size of heap memory occupied by fields in indexes of all primary shards.
Segment countThe number of segments of indexes of all primary shards.
Metrics related to garbage collectorsTotal Young Gen GCThe total runtime of the young-generation garbage collector in the entire cluster.
Total Old Gen GCThe total runtime of the old-generation garbage collector in the entire cluster.
Metrics related to throughputMin ThroughputThe minimum queries per second (QPS) for each task.
Median ThroughputThe average QPS for each task.
Max ThroughputThe maximum QPS for each task.
Metrics related to latency50th percentile latencyThe maximum latency for the fastest 50% of all requests.
90th percentile latencyThe maximum latency for the fastest 90% of all requests.
99.9th percentile latencyThe maximum latency for the fastest 99.9% of all requests.
100th percentile latencyThe maximum latency for all requests.
Metrics related to service time50th percentile service timeThe service time for the fastest 50% of all requests.
90th percentile service timeThe service time for the fastest 90% of all requests.
99.9th percentile service timeThe service time for the fastest 99.9% of all requests.
100th percentile service timeThe service time for all requests.
Metrics related to error rateserror rateThe rate of responses that contain errors to all responses.
Note
  • The latency indicates the period of time from the point in time when a request is submitted to the point in time when a complete response is received. The latency includes the waiting period before Elasticsearch starts to process the request.
  • The service time indicates the period of time from the point in time when a request starts to be processed to the point in time when a response is received.
  • The error rate indicates the rate of responses that contain errors to all responses.

Description of the operation parameter

You can refer to the values of the operation parameter that are listed in the following table to analyze data collected based on metrics such as throughput, latency, service time, and error rate.
ValueDescription
index-appendThe index creation operation.
index-statsThe status of an index.
node-statsThe status of a node.
defaultThe default dimension.
termThe term query.
phraseThe exact queries for phrases.
country_agg_uncachedThe aggregate operation that is not cached.
country_agg_cachedThe aggregate operation that is cached.
scrollThe scroll operation.
expressionThe expression.
painless_staticThe static script.
painless_dynamicThe dynamic script.
large_termsThe combination of multiple term queries.
large_filtered_termsThe combination of multiple filtered term queries.
large_prohibited_termsThe combination of multiple prohibited term queries.