Overview of Elasticsearch performance - Elasticsearch - Alibaba Cloud Documentation Center

Rally is used to perform stress testing on Alibaba Cloud Elasticsearch clusters of different specifications and versions. This topic describes the main metrics and task parameters involved in the stress testing.

Background information

Rally is a stress testing tool provided by open source Elasticsearch. For more information about stress testing and how to use Rally, visit the official website of Rally.

Main metrics involved in stress testing

The following table describes only some important metrics for your reference. You can infer the meanings of other metrics based on the metrics described in the following table. For more information about other metrics, see the documentation for metrics for stress testing performed by using Rally.

Note

The latency indicates the period of time from the point in time when a request is submitted to the point in time when a complete response is received. The latency includes the waiting period before Elasticsearch starts to process the request.
The service time indicates the period of time from the point in time when a request starts to be processed to the point in time when a response is received.
The error rate indicates the rate of responses that contain errors to all responses.

Metric type	Metric name	Description
Metrics related to indexing of primary shards	Cumulative indexing time of primary shards	The cumulative time used for indexing of all primary shards. Note The time is not wall-clock time. It is the sum of the CPU time consumed by multiple threads used for indexing. For example, M threads are used for indexing, and each thread runs for N minutes. In this case, the time collected by this metric is calculated by using the following formula: M × N (unit: minutes).
	Min cumulative indexing time across primary shards	The minimum cumulative time used for indexing across primary shards.
	Median cumulative indexing time across primary shards	The median cumulative time used for indexing across primary shards.
	Max cumulative indexing time across primary shards	The maximum cumulative time used for indexing across primary shards.
	Cumulative indexing throttle time of primary shards	The cumulative time that indexing of all primary shards is throttled. Note The time is not wall-clock time. It is the sum of the CPU time consumed by multiple threads used for indexing when indexing is throttled.
	Min cumulative indexing throttle time across primary shards	The minimum cumulative time that indexing across primary shards is throttled.
	Median cumulative indexing throttle time across primary shards	The median cumulative time that indexing across primary shards is throttled.
	Max cumulative indexing throttle time across primary shards	The maximum cumulative time that indexing across primary shards is throttled.
	Cumulative merge time of primary shards	The cumulative runtime used for merge operations for primary shards. The time also indicates the sum of the CPU time consumed by all threads.
	Cumulative merge count of primary shards	The cumulative number of merges of primary shards. Note Some primary shards may not be merged.
	Min cumulative merge time across primary shards	The minimum cumulative time used for merge operations across primary shards.
	Median cumulative merge time across primary shards	The median cumulative time used for merge operations across primary shards.
	Max cumulative merge time across primary shards	The maximum cumulative time used for merge operations across primary shards.
	Cumulative merge throttle time of primary shards	The cumulative time that merge operations for primary shards are throttled. The time also indicates the sum of the CPU time consumed by all threads.
	Min cumulative merge throttle time across primary shards	The minimum cumulative time that merge operations across primary shards are throttled. The time also indicates the sum of the CPU time consumed by all threads.
	Median cumulative merge throttle time across primary shards	The median cumulative time that merge operations across primary shards are throttled. The time also indicates the sum of the CPU time consumed by all threads.
	Max cumulative merge throttle time across primary shards	The maximum cumulative time that merge operations across primary shards are throttled. The time also indicates the sum of the CPU time consumed by all threads.
	Cumulative refresh time of primary shards	The cumulative time used for index refresh of primary shards. The time also indicates the CPU time consumed by all threads.
	Cumulative refresh count of primary shards	The cumulative number of refreshes of primary shards.
	Min cumulative refresh time across primary shards	The minimum cumulative time used for index refresh across primary shards.
	Median cumulative refresh time across primary shards	The median cumulative time used for index refresh across primary shards.
	Max cumulative refresh time across primary shards	The maximum cumulative time used for index refresh across primary shards.
	Cumulative flush time of primary shards	The cumulative time used for flushing transactional data of indexing of primary shards from the cache to a disk. The time also indicates the sum of the CPU time consumed by all threads.
	Cumulative flush count of primary shards	The cumulative number of flushes for transactional data of indexing of primary shards from the cache to a disk.
	Min cumulative flush time across primary shards	The minimum cumulative time used for flushing transactional data of indexing across primary shards from the cache to a disk. The time also indicates the sum of the CPU time consumed by all threads.
	Median cumulative flush time across primary shards	The median cumulative time used for flushing transactional data of indexing across primary shards from the cache to a disk. The time also indicates the sum of the CPU time consumed by all threads.
	Max cumulative flush time across primary shards	The maximum cumulative time used for flushing transactional data of indexing across primary shards from the cache to a disk. The time also indicates the sum of the CPU time consumed by all threads.
	Store size	The size of data stored in indexes. The size does not include the size of translogs and that of data stored in replica shards.
	Translog size	The size of translogs.
	Heap used for segments	The size of heap memory occupied by the segments of all primary shards.
	Heap used for doc values	The size of heap memory occupied by documents in all primary shards.
	Heap used for terms	The size of heap memory occupied by terms factors of all primary shards.
	Heap used for norms	The size of heap memory occupied by norms factors of all primary shards.
	Heap used for points	The size of heap memory occupied by points of all primary shards.
	Heap used for stored fields	The size of heap memory occupied by fields in all primary shards.
	Segment count	The number of segments of all primary shards.
Metrics related to garbage collectors	Total Young Gen GC	The total runtime of the young-generation garbage collector in the entire cluster.
Metrics related to garbage collectors	Total Old Gen GC	The total runtime of the old-generation garbage collector in the entire cluster.
Metrics related to throughput	Min Throughput	The minimum queries per second (QPS) for each task.
	Median Throughput	The median QPS for each task.
	Max Throughput	The maximum QPS for each task.
Metrics related to latency	50th percentile latency	The latency for the fastest 50% of all requests.
	90th percentile latency	The latency for the fastest 90% of all requests.
	99.9th percentile latency	The latency for the fastest 99.9% of all requests.
	100th percentile latency	The latency for all requests.
Metrics related to service time	50th percentile service time	The service time for the fastest 50% of all requests.
	90th percentile service time	The service time for the fastest 90% of all requests.
	99.9th percentile service time	The service time for the fastest 99.9% of all requests.
	100th percentile service time	The service time for all requests.
Metrics related to error rates	error rate	The rate of responses that contain errors to all responses.

Task parameters involved in stress testing

You can view metrics such as throughput, latency, service time, and error rate of a cluster by task.

Operation	Description
index-append	The index creation operation.
index-stats	The status of an index.
node-stats	The status of a node.
default	The default dimension.
term	The term query.
phrase	The exact queries for phrases.
country_agg_uncached	The aggregate operation that is not cached.
country_agg_cached	The aggregate operation that is cached.
scroll	The scroll operation.
expression	The expression.
painless_static	The static script.
painless_dynamic	The dynamic script.
large_terms	The combination of multiple term queries.
large_filtered_terms	The combination of multiple filtered term queries.
large_prohibited_terms	The combination of multiple prohibited term queries.

References

Performance test of an Elasticsearch cluster with 4-vCPU 16-GiB data nodes
Performance test of an Elasticsearch cluster with 8-vCPU 32-GiB data nodes
Performance test of an Elasticsearch cluster with 16-vCPU 64-GiB data nodes
Comparison of performance test results between an Elasticsearch V6.8 cluster and an Elasticsearch V8.9 cluster
Comparison of performance test results between an Elasticsearch cluster whose CPU type is Intel and an Elasticsearch cluster whose CPU type is AMD
For information about how to purchase an Alibaba Cloud Elasticsearch cluster, see Create an Alibaba Cloud Elasticsearch cluster.
For information about how to evaluate the specifications and storage capacity of an Elasticsearch cluster, see Evaluate specifications and storage capacity.