This document describes how to use the OpenSearch Benchmark tool to run a performance benchmark on PolarSearch. You can use this guide to conduct your own tests and evaluate the search and data ingestion performance of different products under real-world workloads.
Test tool
OpenSearch Benchmark (OSB) is an open-source benchmark framework for search engines from the OpenSearch project, formerly known as Elasticsearch Rally. It includes standardized industry workloads and supports uniform, repeatable performance benchmarks for any search engine compatible with the Elasticsearch/OpenSearch REST APIs, making it ideal for side-by-side product comparisons.
Project homepage: https://github.com/opensearch-project/opensearch-benchmark
Documentation: https://docs.opensearch.org/docs/latest/benchmark/
Test environment
Test tool: OpenSearch Benchmark 2.1.0
Setup: The ECS instance and the PolarSearch cluster must be in the same region, availability zone, and VPC network.
ECS instance:
Instance type: ecs.c9i.8xlarge (32 cores, 128 GiB)
Operating system: Ubuntu 22.04
Python version: 3.10+
PolarSearch cluster:
Node type: 8 cores, 32 GB
Number of nodes: 2
Parameter configuration: All tests use the default, out-of-the-box configuration without any adjustments to cluster parameters.
Products and versions
Product | PolarSearch version | Node type | Number of nodes |
PolarSearch | 3.0 | 8 cores, 32 GB | 2 |
PolarSearch | 1.0 | 8 cores, 32 GB | 2 |
Workloads
Workload | Description | Test operations |
Based on real web server access logs from the 1998 FIFA World Cup. The dataset contains approximately 247 million log records. | index-append data ingestion, term query, range time query, aggregation | |
Based on trip data from New York City taxis in 2015. | index-append data ingestion, term query, range time query, geodistance query, aggregation | |
Based on the GeoNames geographical database. The dataset uses the global gazetteer export (allCountries) from April 2017, containing approximately 11.4 million records of global points of interest. | term and phrase (exact/full-text) queries, aggregation, decay_geo_gauss_function_score, painless_static script scoring, desc_sort_population sort query |
Test scenarios
Indexing performance test
This test batch writes the complete dataset to the cluster and measures the write throughput for each feature version with different numbers of concurrent indexing clients (
bulk_indexing_clients).The test uses the
append-no-conflicts-index-onlytest procedure, which creates an index and ingests data without performing search operations. Each test run starts with an empty index.The number of concurrent indexing clients is tested sequentially at 1, 2, 4, 8, 16, and 32.
Search performance test
After data ingestion, specify search tasks with the
--include-tasksparameter to run search tests on existing data without re-importing it.Tests are run with a sequential number of concurrent search clients (search_clients): 1, 2, 4, 8, 16, and 32, with
target_throughputset to 0 (full-speed stress test mode).
Install OpenSearch Benchmark
OpenSearch Benchmark requires Python 3.8 or later.
Run the following command to install OpenSearch Benchmark.
pip install opensearch-benchmarkRun the following command to verify the installation.
opensearch-benchmark --version
Test procedure
Prerequisites
Obtain the endpoint, username, and password for your PolarSearch cluster and verify network connectivity:
curl -u <user>:<password> http://<endpoint>/_cluster/health?prettyResponse status descriptions:
"status": "green": The cluster is healthy.
"status": "yellow": All primary shards are allocated, but some replica shards are not (for example, a node is offline). The cluster can still process read and write requests, but its high availability is reduced. Use the following commands to investigate:
# Check shard allocation status curl -u <user>:<password> -XGET "http://<endpoint>/_cat/shards?v"# View details about unassigned shards curl -u <user>:<password> -XGET "http://<endpoint>/_allocation/explain""status": "red": At least one primary shard is unassigned, making some data unavailable. Do not run performance tests when the cluster is in this state.
Run the performance test
Run the following command to start the performance test.
opensearch-benchmark run \
--workload="<workload>" \
--client-options="basic_auth_user:<user>,basic_auth_password:<password>,verify_certs:false" \
--target-hosts="<endpoint>" \
--pipeline=benchmark-only \
--results-file="path/to/result_file.md" \
--kill-running-processes \
--workload-params="number_of_replicas:<num_replicas>,number_of_shards:<num_shards>,bulk_indexing_clients:<num_indexing_clients>,search_clients:<num_search_clients>,target_throughput:0"
# Optional parameters
# --test-procedure="<test_procedure_name>"
# --include-tasks="<task_names>"General command-line parameters:
Parameter | Description |
--workload | The name of the test workload, such as |
--test-procedure | The name of the test procedure. For write-only tests, use Note You can run the |
--include-tasks | When specified, this option runs only the listed tasks. It skips the index deletion, creation, and data ingestion steps, allowing you to run search tests on existing data. |
--client-options | Authentication credentials for the cluster. Note
|
--target-hosts | The endpoint of the cluster under test. |
--pipeline=benchmark-only | Instructs OpenSearch Benchmark to use an external PolarSearch cluster. |
--results-file | The output path for the test results, which are saved in Markdown format. |
--kill-running-processes | Automatically terminates any lingering OpenSearch Benchmark processes from previous runs. |
--workload-params | A comma-separated list of key-value pairs used to inject runtime parameters into the workload's Jinja2 template, overriding default values. |
Workload parameters:
Parameter | Description |
number_of_replicas | The number of replica shards for the index. |
number_of_shards | The number of primary shards for the index. |
bulk_indexing_clients | The number of concurrent indexing clients. |
search_clients | The number of concurrent search clients; applies during search tests. |
target_throughput:0 | Disables rate limiting, allowing the test to run at full speed to measure the cluster's maximum search throughput. |
Test results
After each test completes, OpenSearch Benchmark prints a results summary to the console and writes detailed results to the file specified by the --results-file parameter.
Metric | Description |
Mean throughput | The average throughput during the test, measured in ops/s (operations per second). In full-speed stress testing mode where |
p50 latency | The 50th percentile latency (median). Half of all requests completed faster than this value, reflecting the typical request latency. |
p90 / p99 latency | The 90th and 99th percentile latencies. These values reflect the latency of long-tail requests and are key indicators of service stability. Lower values indicate less latency variance under high load. |
Service time | The actual processing time from when a request is sent until a response is received, excluding queueing time. It reflects the cluster's pure processing time. This metric also includes percentile values such as p50, p90, and p99. |
Error rate | The percentage of failed requests during the test. A valid test result should have an error rate of 0%. A non-zero value indicates that the cluster is encountering errors under the current load (such as circuit breaking or timeouts), making the data from that test run unreliable. |
For detailed test results, see PolarSearch 1.0 Performance Test Results and PolarSearch 3.0 Performance Test Results.