Benchmark vector retrieval over MySQL with VectorDBBench - PolarDB

This topic describes how to use VectorDBBench to benchmark the vector retrieval performance of PolarDB over the MySQL protocol and presents test results for various datasets and concurrency levels.

Test environment

PolarDB cluster specifications

Node role	Specification	Configuration	Quantity
read-write node (RW)	polar.mysql.x4.large	4 vCPU, 16 GB	1
read-only node (hot standby)	polar.mysql.x4.large	4 vCPU, 16 GB	1
columnar index read-only node	polar.mysql.x4.4xlarge	32 vCPU, 128 GB	1

Note

Vector data is created on an InnoDB table by usingCOMMENT 'COLUMNAR=1'. The columnar index read-only node automatically synchronizes the data and builds aHNSW_FLAT vector index. Vector retrieval requests are sent to the columnar index read-only node.

Test client (ECS)

The test client is an ecs.g9i.4xlarge (16 vCPU / 64 GB memory) ECS instance. It is located in the same availability zone as the PolarDB cluster.

Test tool and datasets

VectorDBBench is an open-source vector database benchmark tool from Zilliz. It supports end-to-end performance evaluation for mainstream vector databases, covering data ingestion, index building, and vector retrieval. VectorDBBench provides the following subcommands for PolarDB, each mapped to a different vector index type:

polardbhnswflat: runs tests using the HNSW_FLAT index type.
polardbhnswpq: runs tests using the HNSW_PQ index type.
polardbhnswsq: runs tests using the HNSW_SQ index type.

This topic usespolardbhnswflat (HNSW_FLAT index) to run tests on the following three datasets:

Dataset	Number of vectors	Dimension	Case type
Cohere 768D 1M	1,000,000	768	Performance768D1M
Cohere 768D 10M	10,000,000	768	Performance768D10M
OpenAI 1536D 5M	5,000,000	1536	Performance1536D5M

Test steps

Install VectorDBBench

On the test client, run the following commands to install VectorDBBench:

git clone https://github.com/zilliztech/VectorDBBench.git
cd VectorDBBench

# Create and activate a venv
python3 -m venv .venv
source .venv/bin/activate

# Upgrade pip
pip install --upgrade pip

# Install VectorDBBench and the PolarDB dependencies
pip install -e '.[polardb]'

Run the benchmark

The following example command runs a benchmark on the Cohere 768D 1M dataset:

DATASET_LOCAL_DIR=/root/ \
DATASET_SOURCE=AliyunOSS \
NUM_PER_BATCH=64 \
vectordbbench polardbhnswflat \
    --case-type Performance768D1M \
    --username <user_name> \
    --password <password> \
    --host <host> \
    --port 3306 \
    --m 16 \
    --ef-construction 256 \
    --ef-search 256 \
    --insert-workers 64 \
    --num-concurrency '20,40,60,80,100' \
    --concurrency-duration 60 \
    --task-label test_ecs \
    --db-label ecs_test \
    --post-load-index

Parameter descriptions

Parameter	Description
`NUM_PER_BATCH`	The batch size for a single INSERT statement during data ingestion.
`--case-type`	Specifies the test dataset. Valid values: Performance768D1M, Performance768D10M, and Performance1536D5M.
`--m` / `--ef-construction`	HNSW graph construction parameters.`--m` controls the maximum number of neighbors for each node, and`--ef-construction` controls the search width during index building.
`--ef-search`	The search width during retrieval. A larger value usually increases recall but also increases latency.
`--insert-workers`	The number of concurrent threads for data import.
`--num-concurrency`	A comma-separated list of concurrency levels for the retrieval phase.
`--concurrency-duration`	The duration in seconds for each concurrency level.
`--post-load-index`	Imports all data first, then builds the vector index in a single operation.

Note

To test other datasets, change the--case-type value and adjust the--m,--ef-construction, and--ef-search parameters based on the dataset's characteristics.

Test results

Cohere 768D 1M dataset

Test parameters: --m 16 --ef-construction 256 --ef-search 256.
Key metrics:
Metric
Value
Recall@100
0.9612
Single-thread average latency
2.6 ms
Single-thread p95 latency
3.0 ms
Single-thread p99 latency
3.3 ms
Index building time (optimize)
76.80 s
Peak QPS (concurrency=100)
13060.41
Performance at different concurrency levels:
Concurrency
QPS
Average latency (ms)
p95 latency (ms)
p99 latency (ms)
20
5771.52
3.46
4.80
6.48
40
8901.77
4.48
7.29
9.55
60
11932.17
5.01
9.19
12.51
80
12589.50
6.33
11.97
15.96
100
13060.41
7.62
14.33
19.50

Cohere 768D 10M dataset

Test parameters: --m 16 --ef-construction 500 --ef-search 300.
Key metrics:
Metric
Value
Recall@100
0.9551
Single-thread average latency
3.1 ms
Single-thread p95 latency
3.7 ms
Single-thread p99 latency
4.2 ms
Index building time (optimize)
1625.95 s
Peak QPS
10174.35
Performance at different concurrency levels:
Concurrency
QPS
Average latency (ms)
p95 latency (ms)
p99 latency (ms)
20
4859.82
4.11
5.78
7.72
40
6556.74
6.09
9.95
12.55
60
9293.53
6.44
12.25
16.45
80
10063.69
7.93
15.65
21.15
100
10174.35
9.80
18.99
26.17

OpenAI 1536D 5M dataset

Test parameters: --m 16 --ef-construction 256 --ef-search 256
Key metrics:
Metric
Value
Recall@100
0.9676
Single-thread average latency
3.4 ms
Single-thread p95 latency
3.9 ms
Single-thread p99 latency
4.4 ms
Index building time (optimize)
1508.41 s
Peak QPS (concurrency=100)
9130.94
Performance at different concurrency levels:
Concurrency
QPS
Average latency (ms)
p95 latency (ms)
p99 latency (ms)
20
4597.60
4.34
5.84
7.49
40
6539.36
6.11
9.70
12.34
60
8438.65
7.09
13.17
17.72
80
8996.89
8.86
17.42
23.38
100
9130.94
10.91
21.08
28.63

Summary and comparison

Dataset	Number of vectors	Vector dimension	Recall@100	p99 latency (ms)	Peak QPS	Index building time (s)
Cohere 768D 1M	1,000,000	768	0.9612	3.3	13060.41	76.80
Cohere 768D 10M	10,000,000	768	0.9551	4.2	10174.35	1625.95
OpenAI 1536D 5M	5,000,000	1536	0.9676	4.4	9130.94	1508.41

Conclusion

High recall with low latency
Across all three datasets, Recall@100 exceeds 0.95, with single-thread p99 latency between 3.3 ms and 4.4 ms. This demonstrates that the system achieves millisecond-level response times without sacrificing high recall.
Excellent concurrency scalability
As concurrency increases from 20 to 100, QPS grows almost linearly. With the Cohere 768D 1M dataset, the benchmark reaches 13,060 QPS at 100 concurrency with a p99 latency of only 19.50 ms, showing excellent throughput scaling.
High throughput for high-dimensional, large-scale workloads
For the OpenAI 1536D 5M dataset (1536 dimensions, 5 million vectors), the benchmark reaches 9,130 QPS at 100 concurrency with a p99 latency of 28.63 ms. This indicates that PolarDB delivers stable retrieval performance in high-dimensional, large-scale scenarios.
Efficient index building
Building a vector index for 1 million vectors takes only 76.80 s. For 10 million vectors, it takes 1,625.95 s (about 27 minutes). This index building speed meets the demands of production environments.
Engineering-friendly design
PolarDB uses a columnar index read-only node to automatically synchronize data and build the vector index. You do not need to deploy a standalone vector database. Vector retrieval over the standard MySQL protocol reduces operational and integration overhead.

Metric	Value
Recall@100	0.9612
Single-thread average latency	2.6 ms
Single-thread p95 latency	3.0 ms
Single-thread p99 latency	3.3 ms
Index building time (optimize)	76.80 s
Peak QPS (concurrency=100)	13060.41

Concurrency	QPS	Average latency (ms)	p95 latency (ms)	p99 latency (ms)
20	5771.52	3.46	4.80	6.48
40	8901.77	4.48	7.29	9.55
60	11932.17	5.01	9.19	12.51
80	12589.50	6.33	11.97	15.96
100	13060.41	7.62	14.33	19.50

Concurrency	QPS	Average latency (ms)	p95 latency (ms)	p99 latency (ms)
20	4859.82	4.11	5.78	7.72
40	6556.74	6.09	9.95	12.55
60	9293.53	6.44	12.25	16.45
80	10063.69	7.93	15.65	21.15
100	10174.35	9.80	18.99	26.17

Concurrency	QPS	Average latency (ms)	p95 latency (ms)	p99 latency (ms)
20	4597.60	4.34	5.84	7.49
40	6539.36	6.11	9.70	12.34
60	8438.65	7.09	13.17	17.72
80	8996.89	8.86	17.42	23.38
100	9130.94	10.91	21.08	28.63