All Products
Search
Document Center

Elasticsearch:Metrics and task parameters for stress testing

Last Updated:Mar 17, 2026

Rally is used to perform stress testing on Alibaba Cloud Elasticsearch clusters of different specifications and versions. This topic describes the main metrics and task parameters involved in the stress testing.

Background information

Rally is a stress testing tool provided by open-source Elasticsearch. For more information about stress testing and how to use Rally, see the official Rally documentation.

Understanding the metrics

Before reading the metric tables, note the following:

  • Metrics are reported per task, not as a single cluster-wide aggregate. The task parameters section lists the available tasks.

  • Check metrics in this order: error rate first. If the error rate is not 0%, the other metrics are not meaningful. Once errors are confirmed absent, evaluate throughput, then latency percentiles.

  • Latency and service time use ESRally-specific definitions that differ from tools such as JMeter:

    • Latency: the period from when a request is submitted to when the complete response is received. This includes any waiting time before Elasticsearch begins processing the request.

    • Service time: the period from when Elasticsearch starts processing a request to when the response is received. Service time does not include queue waiting time.

    • A large gap between service time and latency indicates that requests are spending significant time waiting in the queue before being processed.

  • Error rate: the rate of responses that contain errors relative to all responses.

  • Cumulative timing metrics are not wall-clock time. They represent the sum of CPU time across all indexing threads. For example, if M threads each run for N minutes, the metric value is M × N minutes.

For the complete list of Rally metrics, see the Rally metrics documentation.

Metrics reference

The following tables cover the key metrics reported by Rally. You can infer the meanings of unlisted metrics from similar entries in each table.

Indexing performance metrics

These metrics track how efficiently Elasticsearch indexes data into primary shards, including time spent on background operations such as merging, refreshing, and flushing.

Note

All cumulative timing metrics in this section are not wall-clock time. They represent the sum of CPU time consumed by multiple threads. For example, if M threads each run for N minutes, the value is M × N minutes.

Metric type

Metric name

Description

Metrics related to indexing of primary shards

Cumulative indexing time of primary shards

The total CPU time consumed by all indexing threads across all primary shards.

Note

The time is not wall-clock time. It is the sum of the CPU time consumed by multiple threads used for indexing. For example, M threads are used for indexing, and each thread runs for N minutes. In this case, the time collected by this metric is calculated using the following formula: M × N (unit: minutes).

Min cumulative indexing time across primary shards

The minimum cumulative indexing time among all primary shards.

Median cumulative indexing time across primary shards

The median cumulative indexing time among all primary shards.

Max cumulative indexing time across primary shards

The maximum cumulative indexing time among all primary shards.

Cumulative indexing throttle time of primary shards

The total CPU time that indexing was throttled across all primary shards.

Note

The time is not wall-clock time. It is the sum of the CPU time consumed by multiple threads used for indexing when indexing is throttled.

Min cumulative indexing throttle time across primary shards

The minimum cumulative throttle time among all primary shards.

Median cumulative indexing throttle time across primary shards

The median cumulative throttle time among all primary shards.

Max cumulative indexing throttle time across primary shards

The maximum cumulative throttle time among all primary shards.

Cumulative merge time of primary shards

The total CPU time consumed by merge operations across all primary shards. The time also indicates the sum of the CPU time consumed by all threads.

Cumulative merge count of primary shards

The total number of merge operations across all primary shards.

Note

Not all shards will be merged.

Min cumulative merge time across primary shards

The minimum cumulative merge time among all primary shards.

Median cumulative merge time across primary shards

The median cumulative merge time among all primary shards.

Max cumulative merge time across primary shards

The maximum cumulative merge time among all primary shards.

Cumulative merge throttle time of primary shards

The total CPU time that merge operations were throttled across all primary shards. The time also indicates the sum of the CPU time consumed by all threads.

Min cumulative merge throttle time across primary shards

The minimum cumulative merge throttle time among all primary shards. The time also indicates the sum of the CPU time consumed by all threads.

Median cumulative merge throttle time across primary shards

The median cumulative merge throttle time among all primary shards. The time also indicates the sum of the CPU time consumed by all threads.

Max cumulative merge throttle time across primary shards

The maximum cumulative merge throttle time among all primary shards. The time also indicates the sum of the CPU time consumed by all threads.

Cumulative refresh time of primary shards

The total CPU time consumed by refresh operations across all primary shards. The time also indicates the CPU time consumed by all threads.

Cumulative refresh count of primary shards

The total number of refresh operations across all primary shards.

Min cumulative refresh time across primary shards

The minimum cumulative refresh time among all primary shards.

Median cumulative refresh time across primary shards

The median cumulative refresh time among all primary shards.

Max cumulative refresh time across primary shards

The maximum cumulative refresh time among all primary shards.

Cumulative flush time of primary shards

The total CPU time consumed by flushing transactional data of indexing of primary shards from the cache to a disk. The time also indicates the sum of the CPU time consumed by all threads.

Cumulative flush count of primary shards

The total number of flushes for transactional data of indexing of primary shards from the cache to a disk.

Min cumulative flush time across primary shards

The minimum cumulative time used for flushing transactional data of indexing across primary shards from the cache to a disk. The time also indicates the sum of the CPU time consumed by all threads.

Median cumulative flush time across primary shards

The median cumulative time used for flushing transactional data of indexing across primary shards from the cache to a disk. The time also indicates the sum of the CPU time consumed by all threads.

Max cumulative flush time across primary shards

The maximum cumulative time used for flushing transactional data of indexing across primary shards from the cache to a disk. The time also indicates the sum of the CPU time consumed by all threads.

Store size

The size of data stored in indexes. This does not include translog size or data stored in replica shards.

Translog size

The size of translogs.

Heap used for segments

The heap memory used by segments across all primary shards.

Heap used for doc values

The heap memory used by doc values across all primary shards.

Heap used for terms

The heap memory used by terms data across all primary shards.

Heap used for norms

The heap memory used by norms data across all primary shards.

Heap used for points

The heap memory used by points data across all primary shards.

Heap used for stored fields

The heap memory used by stored fields across all primary shards.

Segment count

The total number of segments across all primary shards.

Metrics related to garbage collectors

Total Young Gen GC

The total runtime of the young-generation garbage collector across the entire cluster.

Total Old Gen GC

The total runtime of the old-generation garbage collector across the entire cluster.

Metrics related to throughput

Min Throughput

The minimum throughput observed during the task.

Median Throughput

The median throughput observed during the task.

Max Throughput

The maximum throughput observed during the task. High Max Throughput relative to Median Throughput may indicate burst behaviour rather than sustained capacity.

Metrics related to latency

50th percentile latency

The latency for the fastest 50% of all requests.

90th percentile latency

The latency for the fastest 90% of all requests.

99.9th percentile latency

The latency for the fastest 99.9% of all requests.

100th percentile latency

The latency for all requests.

Metrics related to service time

50th percentile service time

The service time for the fastest 50% of all requests.

90th percentile service time

The service time for the fastest 90% of all requests.

99.9th percentile service time

The service time for the fastest 99.9% of all requests.

100th percentile service time

The service time for all requests.

Metrics related to error rates

error rate

The rate of responses that contain errors relative to all responses. An error rate above 0% means the benchmark results are not valid — all other metrics should be disregarded until the error rate is resolved.

Task parameters

You can view metrics such as throughput, latency, service time, and error rate broken down by task. The following table describes each task (Rally operation) included in the stress test.

Operation

Description

index-append

Indexes new documents in append-only mode. Tests write throughput and indexing performance.

index-stats

Retrieves index statistics. Tests the overhead of the stats API.

node-stats

Retrieves node-level statistics. Tests the overhead of the node stats API.

default

Runs a default-dimension search query.

term

Runs a term query. Tests single-value keyword search performance.

phrase

Runs a phrase query. Tests exact phrase matching performance.

country_agg_uncached

Runs a country-level aggregation without using the cache. Simulates cold-start aggregation queries, as seen in dashboards on first load.

country_agg_cached

Runs a country-level aggregation with the cache enabled. Tests repeated aggregation performance after the first load.

scroll

Runs a scroll operation. Tests the performance of paginating through large result sets.

expression

Runs a script query using an expression. Tests expression-based scripted query performance.

painless_static

Runs a Painless script query with a statically compiled script. Tests compiled script execution performance.

painless_dynamic

Runs a Painless script query with a dynamically compiled script. Tests dynamic script compilation and execution performance.

large_terms

Runs a query combining multiple term clauses. Tests performance of high-cardinality term queries.

large_filtered_terms

Runs a query combining multiple filtered term clauses. Tests performance of filtered high-cardinality term queries.

large_prohibited_terms

Runs a query combining multiple prohibited (must-not) term clauses. Tests performance of exclusion-heavy term queries.

References