PolarSearch performance benchmark - PolarDB - Alibaba Cloud Documentation Center

This document describes how to use the OpenSearch Benchmark tool to run a performance benchmark on PolarSearch. You can use this guide to conduct your own tests and evaluate the search and data ingestion performance of different products under real-world workloads.

Test tool

OpenSearch Benchmark (OSB) is an open-source benchmark framework for search engines from the OpenSearch project, formerly known as Elasticsearch Rally. It includes standardized industry workloads and supports uniform, repeatable performance benchmarks for any search engine compatible with the Elasticsearch/OpenSearch REST APIs, making it ideal for side-by-side product comparisons.

Project homepage: https://github.com/opensearch-project/opensearch-benchmark
Documentation: https://docs.opensearch.org/docs/latest/benchmark/

Test environment

Test tool: OpenSearch Benchmark 2.1.0
Setup: The ECS instance and the PolarSearch cluster must be in the same region, availability zone, and VPC network.
ECS instance:
- Instance type: ecs.c9i.8xlarge (32 cores, 128 GiB)
- Operating system: Ubuntu 22.04
- Python version: 3.10+
PolarSearch cluster:
- Node type: 8 cores, 32 GB
- Number of nodes: 2
- Parameter configuration: All tests use the default, out-of-the-box configuration without any adjustments to cluster parameters.

Products and versions

Product	PolarSearch version	Node type	Number of nodes
PolarSearch	3.0	8 cores, 32 GB	2
PolarSearch	1.0	8 cores, 32 GB	2

Workloads

Workload	Description	Test operations
HTTP logs	Based on real web server access logs from the 1998 FIFA World Cup. The dataset contains approximately 247 million log records.	index-append data ingestion, term query, range time query, aggregation
NYC taxis	Based on trip data from New York City taxis in 2015.	index-append data ingestion, term query, range time query, geodistance query, aggregation
Geonames	Based on the GeoNames geographical database. The dataset uses the global gazetteer export (allCountries) from April 2017, containing approximately 11.4 million records of global points of interest.	term and phrase (exact/full-text) queries, aggregation, decay_geo_gauss_function_score, painless_static script scoring, desc_sort_population sort query

Test scenarios

Indexing performance test

This test batch writes the complete dataset to the cluster and measures the write throughput for each feature version with different numbers of concurrent indexing clients (bulk_indexing_clients).
The test uses the append-no-conflicts-index-only test procedure, which creates an index and ingests data without performing search operations. Each test run starts with an empty index.
The number of concurrent indexing clients is tested sequentially at 1, 2, 4, 8, 16, and 32.

Search performance test

After data ingestion, specify search tasks with the --include-tasks parameter to run search tests on existing data without re-importing it.
Tests are run with a sequential number of concurrent search clients (search_clients): 1, 2, 4, 8, 16, and 32, with target_throughput set to 0 (full-speed stress test mode).

Install OpenSearch Benchmark

Note

OpenSearch Benchmark requires Python 3.8 or later.

Run the following command to install OpenSearch Benchmark.
```
pip install opensearch-benchmark
```
Run the following command to verify the installation.
```
opensearch-benchmark --version
```

Test procedure

Prerequisites

Obtain the endpoint, username, and password for your PolarSearch cluster and verify network connectivity:

curl -u <user>:<password> http://<endpoint>/_cluster/health?pretty

Response status descriptions:

"status": "green": The cluster is healthy.
"status": "yellow": All primary shards are allocated, but some replica shards are not (for example, a node is offline). The cluster can still process read and write requests, but its high availability is reduced. Use the following commands to investigate:
```
# Check shard allocation status
curl -u <user>:<password> -XGET "http://<endpoint>/_cat/shards?v"
```
```
# View details about unassigned shards
curl -u <user>:<password> -XGET "http://<endpoint>/_allocation/explain"
```
"status": "red": At least one primary shard is unassigned, making some data unavailable. Do not run performance tests when the cluster is in this state.

Run the performance test

Run the following command to start the performance test.

opensearch-benchmark run \
    --workload="<workload>" \
    --client-options="basic_auth_user:<user>,basic_auth_password:<password>,verify_certs:false" \
    --target-hosts="<endpoint>" \
    --pipeline=benchmark-only \
    --results-file="path/to/result_file.md" \
    --kill-running-processes \
    --workload-params="number_of_replicas:<num_replicas>,number_of_shards:<num_shards>,bulk_indexing_clients:<num_indexing_clients>,search_clients:<num_search_clients>,target_throughput:0"
    
    
# Optional parameters
#   --test-procedure="<test_procedure_name>"
#   --include-tasks="<task_names>"

General command-line parameters:

Parameter	Description
--workload	The name of the test workload, such as `http_logs` or `nyc_taxis`.
--test-procedure	The name of the test procedure. For write-only tests, use `append-no-conflicts-index-only`. For search tests, use `append-no-conflicts`. This is a complete procedure that includes a search job. You can use it with `--include-tasks` to run only the search part. Note You can run the `opensearch-benchmark info --workload=<workload_name>` command to view the available test procedures.
--include-tasks	When specified, this option runs only the listed tasks. It skips the index deletion, creation, and data ingestion steps, allowing you to run search tests on existing data.
--client-options	Authentication credentials for the cluster. Note `verify_certs:false` means to skip SSL certificate verification.
--target-hosts	The endpoint of the cluster under test.
--pipeline=benchmark-only	Instructs OpenSearch Benchmark to use an external PolarSearch cluster.
--results-file	The output path for the test results, which are saved in Markdown format.
--kill-running-processes	Automatically terminates any lingering OpenSearch Benchmark processes from previous runs.
--workload-params	A comma-separated list of key-value pairs used to inject runtime parameters into the workload's Jinja2 template, overriding default values.

Workload parameters:

Parameter	Description
number_of_replicas	The number of replica shards for the index.
number_of_shards	The number of primary shards for the index.
bulk_indexing_clients	The number of concurrent indexing clients.
search_clients	The number of concurrent search clients; applies during search tests.
target_throughput:0	Disables rate limiting, allowing the test to run at full speed to measure the cluster's maximum search throughput.

Test results

After each test completes, OpenSearch Benchmark prints a results summary to the console and writes detailed results to the file specified by the --results-file parameter.

Metric	Description
Mean throughput	The average throughput during the test, measured in ops/s (operations per second). In full-speed stress testing mode where `target_throughput=0`, this value directly reflects the actual maximum processing capacity of the cluster at the current concurrency and is a core metric for measuring search engine performance. The higher the value, the more requests can be processed per unit of time, and the better the performance.
p50 latency	The 50th percentile latency (median). Half of all requests completed faster than this value, reflecting the typical request latency.
p90 / p99 latency	The 90th and 99th percentile latencies. These values reflect the latency of long-tail requests and are key indicators of service stability. Lower values indicate less latency variance under high load.
Service time	The actual processing time from when a request is sent until a response is received, excluding queueing time. It reflects the cluster's pure processing time. This metric also includes percentile values such as p50, p90, and p99.
Error rate	The percentage of failed requests during the test. A valid test result should have an error rate of 0%. A non-zero value indicates that the cluster is encountering errors under the current load (such as circuit breaking or timeouts), making the data from that test run unreliable.

For detailed test results, see PolarSearch 1.0 Performance Test Results and PolarSearch 3.0 Performance Test Results.