This topic shows how to benchmark ApsaraDB RDS for PostgreSQL vector search performance using Hierarchical Navigable Small Worlds (HNSW) indexes. The test uses the ANN-Benchmarks tool to measure recall rate, queries per second (QPS), and index build time across different parameter combinations. ANN-Benchmarks tests single-threaded performance by default. For concurrency performance testing, see Use the pgvector extension to test performance based on IVF indexes.
Test environment
Place the RDS instance and the Elastic Compute Service (ECS) instance in the same virtual private cloud (VPC) and vSwitch to avoid network-induced variance in test results.
| Component | Specification |
|---|---|
| RDS instance | PostgreSQL 16, RDS High-availability Edition, pg.x8.2xlarge.2c (16 cores, 128 GB memory), pgvector 0.8.0 |
| ECS instance | ecs.c6.xlarge (4 cores, 8 GiB memory), Alibaba Cloud Linux 3 |
| Test tool | ANN-Benchmarks (single-threaded by default) |
Prerequisites
Before you begin, ensure that you have:
An RDS for PostgreSQL instance with a privileged account named
ann_testuserand a database namedann_testdb. See Create a database and an accountThe pgvector extension (named
vectorin the system) installed inann_testdb. See Manage extensionsDocker installed on the ECS instance. See Install Docker
Set up the ECS instance
Run the following commands on the ECS instance to download ANN-Benchmarks and create a Python environment:
# Download ANN-Benchmarks
cd ~
git clone https://github.com/erikbern/ann-benchmarks.git
# Install Conda and create the test environment
yum install git
yum install conda
conda create -n ann_test python=3.10.6
conda init bash
source /usr/etc/profile.d/conda.sh
conda activate ann_test
# Install dependencies
cd ~/ann-benchmarks/
pip install -r requirements.txtAll subsequent steps run inside the ann_test virtual environment. If you are logged out due to a timeout, run conda activate ann_test to re-enter the environment.
Run the benchmark
Step 1: Configure the RDS connection
Append the following settings to ~/ann-benchmarks/ann_benchmarks/algorithms/pgvector/module.py, replacing the placeholder values with your actual RDS connection details:
# RDS instance connection settings
os.environ['ANN_BENCHMARKS_PG_USER'] = 'ann_testuser'
os.environ['ANN_BENCHMARKS_PG_PASSWORD'] = 'testPawword' # Replace with your password
os.environ['ANN_BENCHMARKS_PG_DBNAME'] = 'ann_testdb'
os.environ['ANN_BENCHMARKS_PG_HOST'] = 'pgm-****.pg.rds.aliyuncs.com' # Replace with your internal endpoint
os.environ['ANN_BENCHMARKS_PG_PORT'] = '5432'
os.environ['ANN_BENCHMARKS_PG_START_SERVICE'] = 'false' # Disable automatic startupStep 2: Configure HNSW test parameters
Edit ~/ann-benchmarks/ann_benchmarks/algorithms/pgvector/config.yml to define the parameter combinations to test. This test covers three groups: M-16(100), M-16(200), and M-24(200).
float:
any:
- base_args: ['@metric']
constructor: PGVector
disabled: false
docker_tag: ann-benchmarks-pgvector
module: ann_benchmarks.algorithms.pgvector
name: pgvector
run_groups:
M-16(100):
arg_groups: [{M: 16, efConstruction: 100}]
args: {}
query_args: [[10, 20, 40, 80, 120, 200, 400, 800]]
M-16(200):
arg_groups: [{M: 16, efConstruction: 200}]
args: {}
query_args: [[10, 20, 40, 80, 120, 200, 400, 800]]
M-24(200):
arg_groups: [{M: 24, efConstruction: 200}]
args: {}
query_args: [[10, 20, 40, 80, 120, 200, 400, 800]]Parameter descriptions:
| Parameter | Description |
|---|---|
M (in arg_groups) | Maximum number of neighboring nodes per node at each graph layer. Higher values produce a denser graph, increasing recall rate and index build time. |
efConstruction (in arg_groups) | Size of the candidate set during index construction. Higher values improve recall rate at the cost of longer build time. |
ef_search (in query_args) | Size of the candidate set during queries. Higher values improve recall rate and increase query latency. |
Tuning direction at a glance:
To increase recall rate: raise
M,efConstruction, oref_search.To reduce query latency: lower
ef_search.To shorten index build time: lower
efConstructionor increase parallel workers (see Appendix: Parameter impact on index build time).Default values (
M=16,ef_construction=64,ef_search=40) often produce suboptimal recall. Tune from these defaults based on your requirements.
Step 3: Build the test Docker image
(Optional) Modify
~/ann-benchmarks/ann_benchmarks/algorithms/pgvector/Dockerfileto skip the built-in PostgreSQL setup and use only the psycopg and pgvector packages:FROM ann-benchmarks USER root RUN pip install psycopg[binary] pgvectorBuild the Docker image:
cd ~/ann-benchmarks/ python install.py --algorithm pgvectorRun
python install.py --helpto see all available options.
Step 4: Download the dataset
When you run the test script, ANN-Benchmarks automatically downloads the specified dataset. This test uses the nytimes-256-angular dataset. For details on available datasets, see ANN-Benchmarks datasets.
| Dataset | Dimensions | Rows | Test vectors | Nearest neighbors | Distance |
|---|---|---|---|---|---|
| NYTimes | 256 | 290,000 | 10,000 | 100 | Angular |
If the public datasets do not match your workload, convert your vector data to Hierarchical Data Format version 5 (HDF5) format and use it as a custom dataset. See Custom datasets.
Step 5: Run the test and collect results
Run the benchmark:
Parameter Description --datasetDataset to test -kLIMIT value in the SQL query — number of results to return --algorithmAlgorithm to benchmark ( pgvectorin this test)--runsNumber of runs; the best result set is selected --parallelismNumber of parallel workers (default: 1) cd ~/ann-benchmarks nohup python run.py --dataset nytimes-256-angular -k 10 --algorithm pgvector --runs 1 > ann_benchmark_test.log 2>&1 & tail -f ann_benchmark_test.logGenerate result plots:
cd ~/ann-benchmarks python plot.py --dataset nytimes-256-angular --recompute(Optional) Export detailed results including recall rate, QPS, response time (RT), index build time, and index size:
cd ~/ann-benchmarks python data_export.py --out res.csv
Test results
Recall rate and QPS
The following table shows results from the nytimes-256-angular dataset. All three groups use the same query_args values (ef_search = 10, 20, 40, 80, 120, 200, 400, 800).
| m | ef_construction | ef_search | Recall rate | QPS |
|---|---|---|---|---|
| 16 | 100 | 10 | 0.630 | 1423.985 |
| 16 | 100 | 20 | 0.741 | 1131.941 |
| 16 | 100 | 40 | 0.820 | 836.017 |
| 16 | 100 | 80 | 0.871 | 574.733 |
| 16 | 100 | 120 | 0.894 | 440.076 |
| 16 | 100 | 200 | 0.918 | 297.267 |
| 16 | 100 | 400 | 0.945 | 162.759 |
| 16 | 100 | 800 | 0.969 | 84.268 |
| 16 | 200 | 10 | 0.683 | 1299.667 |
| 16 | 200 | 20 | 0.781 | 1094.968 |
| 16 | 200 | 40 | 0.849 | 790.838 |
| 16 | 200 | 80 | 0.895 | 533.826 |
| 16 | 200 | 120 | 0.914 | 405.975 |
| 16 | 200 | 200 | 0.933 | 272.591 |
| 16 | 200 | 400 | 0.956 | 148.688 |
| 16 | 200 | 800 | 0.977 | 76.555 |
| 24 | 200 | 10 | 0.767 | 1182.747 |
| 24 | 200 | 20 | 0.840 | 922.770 |
| 24 | 200 | 40 | 0.887 | 639.899 |
| 24 | 200 | 80 | 0.920 | 411.140 |
| 24 | 200 | 120 | 0.936 | 303.323 |
| 24 | 200 | 200 | 0.953 | 199.752 |
| 24 | 200 | 400 | 0.973 | 105.506 |
| 24 | 200 | 800 | 0.988 | 53.904 |
Index build time
| m | ef_construction | Build time (seconds) |
|---|---|---|
| 16 | 100 | 33.35 |
| 16 | 200 | 57.66 |
| 24 | 200 | 87.23 |
Conclusions
Increasing m, efConstruction, and ef_search consistently improves recall rate at the cost of lower QPS and longer build time. Specifically:
Raising
ef_searchimproves recall rate but reduces QPS.Raising
mandefConstructionimproves recall rate, reduces QPS, and extends build time.If your application requires high recall, avoid the default parameter values (
m=16,ef_construction=64,ef_search=40), which are optimized for speed rather than accuracy.
Appendix: Parameter impact on index build time
Effect of maintenance_work_mem
maintenance_work_mem sets the maximum memory for maintenance operations such as VACUUM and CREATE INDEX (unit: KB). Increasing this value shortens build time — but only up to the size of the dataset. Once maintenance_work_mem exceeds the dataset size, build time stops improving.
The following results use the pg.x8.2xlarge.2c instance type (16 cores, 128 GB memory), max_parallel_maintenance_workers=8, and the nytimes-256-angular dataset (~324 MB).
| maintenance_work_mem | Build time (seconds) |
|---|---|
| 64 MB (65,536 KB) | 52.82 |
| 128 MB (131,072 KB) | 46.79 |
| 256 MB (262,144 KB) | 36.40 |
| 512 MB (524,288 KB) | 18.90 |
| 1 GB (1,048,576 KB) | 19.06 |
Build time plateaus between 512 MB and 1 GB because both values exceed the ~324 MB dataset size.
Effect of max_parallel_maintenance_workers
max_parallel_maintenance_workers controls the number of parallel workers for a single CREATE INDEX operation. Build time decreases as this value increases.
The following results use the same instance type, maintenance_work_mem=2048 (2 GB), and the nytimes-256-angular dataset.
| max_parallel_maintenance_workers | Build time (seconds) |
|---|---|
| 1 | 76.00 |
| 2 | 51.34 |
| 4 | 32.49 |
| 8 | 19.66 |
| 12 | 14.44 |
| 16 | 13.07 |
| 24 | 13.15 |
Effect of vector dimensions
The following results use the GloVe dataset (1,183,514 rows), m=16, efConstruction=64, and ef_search=40, with maintenance_work_mem=8 GB and max_parallel_maintenance_workers=16.
As vector dimension increases, index build time increases, recall rate decreases, QPS decreases, and query latency increases.
| Dimension | Build time (seconds) | Recall rate | QPS | P99 (ms) |
|---|---|---|---|---|
| 25 | 195.10 | 0.99985 | 192.94 | 7.84 |
| 50 | 236.92 | 0.99647 | 152.36 | 9.69 |
| 100 | 319.36 | 0.97231 | 126.89 | 11.14 |
| 200 | 529.33 | 0.93186 | 95.05 | 15.11 |
P99 is the 99th percentile latency — 99% of all query requests complete within this time.
Effect of dataset size
The following results use the dbpedia-openai-{n}k-angular dataset with m=48, efConstruction=256, and ef_search=200.
As the number of rows increases, index build time increases nonlinearly, recall rate decreases, QPS decreases, and query latency increases.
| Dataset size | Rows (10,000s) | Build time (seconds) | Recall rate | QPS | P99 (ms) |
|---|---|---|---|---|---|
| 100k | 10 | 54.05 | 0.9993 | 171.74 | 8.93 |
| 200k | 20 | 137.23 | 0.99901 | 146.78 | 10.81 |
| 500k | 50 | 436.68 | 0.999 | 118.55 | 13.94 |
| 1,000k | 100 | 957.26 | 0.99879 | 101.60 | 16.35 |
Custom datasets
If the public datasets do not represent your workload, generate a custom HDF5 dataset from your own vector data.
This example requires the rds_ai extension. For installation, see Use the AI capabilities provided by the rds_ai extension.
Run the following script to export your vector data and ground-truth neighbors to an HDF5 file:
import h5py import numpy as np import psycopg2 import pgvector.psycopg2 conn_info = { 'host': 'pgm-****.rds.aliyuncs.com', 'user': 'ann_testuser', 'password': '****', 'port': '5432', 'dbname': 'ann_testdb' } embedding_len = 1024 distance_top_n = 100 query_batch_size = 100 try: with psycopg2.connect(**conn_info) as connection: pgvector.psycopg2.register_vector(connection) with connection.cursor() as cur: # Fetch training vectors cur.execute("select count(1) from test_rag") count = cur.fetchone()[0] train_embeddings = [] for start in range(0, count, query_batch_size): query = f"SELECT embedding FROM test_rag ORDER BY id OFFSET {start} LIMIT {query_batch_size}" cur.execute(query) res = [embedding[0] for embedding in cur.fetchall()] train_embeddings.extend(res) train = np.array(train_embeddings) # Generate query embeddings using rds_ai with open('query.txt', 'r', encoding='utf-8') as file: queries = [query.strip() for query in file] test = [] for query in queries: cur.execute(f"SELECT rds_ai.embed('{query.strip()}')::vector(1024)") test.extend([cur.fetchone()[0]]) test = np.array(test) # Compute ground-truth nearest neighbors using angular distance dot_product = np.dot(test, train.T) norm_test = np.linalg.norm(test, axis=1, keepdims=True) norm_train = np.linalg.norm(train, axis=1, keepdims=True) similarity = dot_product / (norm_test * norm_train.T) distance_matrix = 1 - similarity neighbors = np.argsort(distance_matrix, axis=1)[:, :distance_top_n] distances = np.take_along_axis(distance_matrix, neighbors, axis=1) # Save to HDF5 with h5py.File('custom_dataset.hdf5', 'w') as f: f.create_dataset('distances', data=distances) f.create_dataset('neighbors', data=neighbors) f.create_dataset('test', data=test) f.create_dataset('train', data=train) f.attrs.update({ "type": "dense", "distance": "angular", "dimension": embedding_len, "point_type": "float" }) print("The HDF5 file is created and the dataset is added.") except (Exception, psycopg2.DatabaseError) as error: print(f"Error: {error}")Register the custom dataset in the
DATASETSsection of~/ann-benchmarks/ann_benchmarks/datasets.py:DATASETS: Dict[str, Callable[[str], None]] = { ......, "<custom_dataset>": None, }Upload
custom_dataset.hdf5to the~/ann-benchmarksdirectory and pass its name torun.pywith--dataset <custom_dataset>.
What's next
Use the pgvector extension to test performance based on IVF indexes — test concurrency performance using IVF (Inverted File) indexes
Use the AI capabilities provided by the rds_ai extension — generate embeddings directly in the database