Create an HNSW vector index to accelerate vector search - AnalyticDB

Vector indexes accelerate approximate nearest neighbor (ANN) searches on large datasets. AnalyticDB for PostgreSQL uses the FastANN vector search engine, which implements the Hierarchical Navigable Small World (HNSW) algorithm on top of PostgreSQL's segmented page storage. FastANN stores only pointers to vector columns in the index, keeping storage overhead low. It also supports product quantization (PQ) to compress high-dimensional vectors and minimize table lookups during inserts and queries.

Prerequisites

Before you begin, ensure that you have:

An AnalyticDB for PostgreSQL instance with Vector Search Engine Optimization enabled. When this feature is enabled, the FastANN plugin is automatically installed.
A table with a vector column of type real[], float4[], float2[], or smallint[]

If you get the error You must specify an operator class or define a default operator class for the data type, Vector Search Engine Optimization is not enabled. Enable it and retry. For more information, see Enable or disable vector search engine optimization.

Syntax

CREATE INDEX [INDEX_NAME]
ON [SCHEMA_NAME].[TABLE_NAME]
USING ANN(COLUMN_NAME)
WITH (DIM=<DIMENSION>,
      DISTANCEMEASURE=<MEASURE>,
      HNSW_M=<M>,
      HNSW_EF_CONSTRUCTION=<EF_CONSTRUCTION>,
      PQ_ENABLE=<PQ_ENABLE>,
      PQ_SEGMENTS=<PQ_SEGMENTS>,
      PQ_CENTERS=<PQ_CENTERS>,
      EXTERNAL_STORAGE=<EXTERNAL_STORAGE>);

Where:

INDEX_NAME: The index name.
SCHEMA_NAME: The schema (namespace) name.
TABLE_NAME: The table name.
COLUMN_NAME: The vector column to index.

The following table describes the index parameters.

Parameter	Required	Default	Range	Description
`DIM`	Yes	0	[1, 8192]	Vector dimensionality. Inserts that don't match this dimension return an error.
`DISTANCEMEASURE`	No	L2	L2, IP, COSINE	Distance measure for the index. See Choose a distance measure.
`HNSW_M`	No	32	[10, 1000]	Maximum number of neighbors per node. A higher value improves recall at the cost of more memory and slower index builds.
`HNSW_EF_CONSTRUCTION`	No	600	[40, 4000]	Candidate set size during index construction. A higher value improves index quality at the cost of build time.
`PQ_ENABLE`	No	0	0, 1	Enables product quantization (PQ) for dimension reduction. Don't enable PQ if the table has fewer than 500,000 rows.
`PQ_SEGMENTS`	No	0	[1, 256]	Number of k-means segments for PQ. Required only when `PQ_ENABLE=1` and `DIM` is not divisible by 8. `DIM` must be divisible by `PQ_SEGMENTS`.
`PQ_CENTERS`	No	2048	[256, 8192]	Number of k-means centroids for PQ. Takes effect only when `PQ_ENABLE=1`.
`EXTERNAL_STORAGE`	No	0	0, 1	Storage mode for the index: `0` uses segmented page storage (supports PostgreSQL `shared_buffers` caching, and full delete/update support); `1` uses mmap (from v6.6.2.3, supports mark-for-delete for limited deletes and updates). Supported only in version 6.0; not supported in version 7.0.

Choose a distance measure

Pick the distance measure that matches your data and use case. The query operator you use must match the index's distance measure — mismatches cause the query to fall back to an exact (brute-force) search.

Distance measure	Parameter value	Query operator	Use case
Euclidean (L2)	`L2`	`<->`	Image similarity search
Inner product	`IP`	`<#>`	Substitute for cosine similarity after vector normalization
Cosine	`COSINE`	`<=>`	Text similarity search

IP and COSINE require engine version 6.3.10.18 or later. For version requirements, see Upgrade versions.

Best performance tip: Normalize your vectors before import, then use IP as the distance measure. An IP index lets queries also return Euclidean distance and cosine similarity directly. For details, see Vector search.

Create a vector index

The following example uses a text knowledge base: articles are split into chunks, converted to embedding vectors, and stored in a chunks table.

Step 1: Create the table

CREATE TABLE chunks (
    id SERIAL PRIMARY KEY,
    chunk VARCHAR(1024),
    intime TIMESTAMP,
    url VARCHAR(1024),
    feature REAL[]
) DISTRIBUTED BY (id);

Column	Type	Description
`id`	serial	Row ID
`chunk`	varchar(1024)	Text chunk
`intime`	timestamp	Import timestamp
`url`	varchar(1024)	Source article URL
`feature`	real[]	Embedding vector

Step 2: Set the vector column storage to PLAIN

ALTER TABLE chunks ALTER COLUMN feature SET STORAGE PLAIN;

Setting the storage mode to PLAIN avoids column decompression on each row scan, which reduces overhead and improves query performance.

Step 3: Create the vector index

Create an index that matches the distance measure you plan to use in queries. The index type must match the query operator — a mismatch causes the query to fall back to an exact (brute-force) search.

L2 (Euclidean) distance — use <-> in queries:

CREATE INDEX idx_chunks_feature_l2
ON chunks USING ann(feature)
WITH (dim=1536, distancemeasure=l2, hnsw_m=64, pq_enable=1);

Inner product distance — use <#> in queries:

CREATE INDEX idx_chunks_feature_ip
ON chunks USING ann(feature)
WITH (dim=1536, distancemeasure=ip, hnsw_m=64, pq_enable=1);

Cosine distance — use <=> in queries:

CREATE INDEX idx_chunks_feature_cosine
ON chunks USING ann(feature)
WITH (dim=1536, distancemeasure=cosine, hnsw_m=64, pq_enable=1);

Create vector indexes only as needed. Multiple vector columns per table and multiple indexes per column are supported, but unused indexes add overhead.

Step 4: Index structured columns for hybrid queries

CREATE INDEX ON chunks(intime);

Indexing structured columns like intime improves performance when combining vector similarity with filters.

Index building methods

Two approaches are available for building vector indexes. For large-scale imports, asynchronous building is faster: import all data without a vector index first, then build the index in one pass on the complete dataset.

Method	When to use	Trade-off
Stream building	Create the index before importing data	Index is built in real time as data arrives; slows import throughput
Asynchronous building (recommended)	Import data first, then create the index	Faster overall for large datasets; no index overhead during import

Asynchronous building with parallel processing

AnalyticDB for PostgreSQL supports concurrent index building via the fastann.build_parallel_processes Grand Unified Configuration (GUC) parameter. Set it to match the number of cores on your compute nodes.

Instance type	Recommended value
4-core 16 GB	4
8-core 32 GB	8
16-core 64 GB	16

Important

If the instance is serving live traffic, don't set fastann.build_parallel_processes to the maximum. A lower value leaves capacity for ongoing workloads.

Build with DMS:

On DMS, all SET statements and the CREATE INDEX statement must be on the same line to take effect together.

SET statement_timeout = 0;SET idle_in_transaction_session_timeout = 0;SET fastann.build_parallel_processes = 8;CREATE INDEX ON chunks USING ann(feature) WITH (dim=1536, distancemeasure=cosine, hnsw_m=64, pq_enable=1);

Build with psql:

Save the following to create_index.sql:

SET statement_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET fastann.build_parallel_processes = 8;
CREATE INDEX ON chunks USING ann(feature)
WITH (dim=1536, distancemeasure=cosine, hnsw_m=64, pq_enable=1);

Run the file:

psql -h gp-xxxx-master.gpdb.rds.aliyuncs.com -p 5432 -d dbname -U username -f create_index.sql

Vector functions reference

Supported functions

Category	Function	Return type	Description	Supported data types
Distance	`l2_distance`	double precision	Euclidean distance (square root). Note In v6.3.10.17 and earlier, returns squared Euclidean distance. In v6.3.10.18 and later, returns the square root value.	smallint[], float2[], float4[], real[]
Distance	`inner_product_distance`	double precision	Inner product distance. Equals cosine similarity after normalization. Formula:	smallint[], float2[], float4[], real[]
Distance	`cosine_similarity`	double precision	Cosine similarity. Range: [-1, 1]. Measures directional similarity regardless of vector length. Formula:	smallint[], float2[], float4[], real[]
Distance	`dp_distance`	double precision	Dot product distance. Identical to inner product distance. Formula:	smallint[], float2[], float4[], real[]
Distance	`hm_distance`	integer	Hamming distance (bitwise).	int[]
Sorting	`l2_squared_distance`	double precision	Squared Euclidean distance. Skips the square root calculation for faster distance-based sorting. Formula:	smallint[], float2[], float4[], real[]
Sorting	`negative_inner_product_distance`	double precision	Negated inner product distance. Used for descending sort by inner product. Formula:	smallint[], float2[], float4[], real[]
Sorting	`cosine_distance`	double precision	Cosine distance. Range: [0, 2]. Used for descending sort by cosine similarity. Formula:	smallint[], float2[], float4[], real[]
Arithmetic	`vector_add`	smallint[], float2[], or float4[]	Sum of two vector arrays.	smallint[], float2[], float4[], real[]
Arithmetic	`vector_sub`	smallint[], float2[], or float4[]	Difference between two vector arrays.	smallint[], float2[], float4[], real[]
Arithmetic	`vector_mul`	smallint[], float2[], or float4[]	Element-wise product of two vector arrays.	smallint[], float2[], float4[], real[]
Arithmetic	`vector_div`	smallint[], float2[], or float4[]	Element-wise quotient of two vector arrays.	smallint[], float2[], float4[], real[]
Aggregation	`vector_sum`	int or double precision	Sum of all elements in a vector.	smallint[], int[], float2[], float4[], real[], float8[]
Aggregation	`vector_min`	int or double precision	Minimum element value.	smallint[], int[], float2[], float4[], real[], float8[]
Aggregation	`vector_max`	int or double precision	Maximum element value.	smallint[], int[], float2[], float4[], real[], float8[]
Aggregation	`vector_avg`	int or double precision	Average element value.	smallint[], int[], float2[], float4[], real[], float8[]
Aggregation	`vector_norm`	double precision	Norm (length) of a vector.	smallint[], int[], float2[], float4[], real[], float8[]
Aggregation	`vector_angle`	double precision	Angle between two vectors.	smallint[], float2[], float4[], real[]

Important

vector_add, vector_sub, vector_mul, vector_div, vector_sum, vector_min, vector_max, vector_avg, vector_norm, and vector_angle are supported only in v6.6.1.0 and later. To check your version, see View the minor engine version.

Examples:

-- Euclidean distance
SELECT l2_distance(array[1,1,1,1]::real[], array[2,2,2,2]::real[]);
-- Inner product distance
SELECT inner_product_distance(array[1,1,1,1]::real[], array[2,2,2,2]::real[]);
-- Cosine similarity
SELECT cosine_similarity(array[1,1,1,1]::real[], array[2,2,2,2]::real[]);
-- Dot product distance
SELECT dp_distance(array[1,1,1,1]::real[], array[2,2,2,2]::real[]);
-- Hamming distance
SELECT hm_distance(array[1,0,1,0]::int[], array[0,1,0,1]::int[]);
-- Vector addition
SELECT vector_add(array[1,1,1,1]::real[], array[2,2,2,2]::real[]);
-- Vector subtraction
SELECT vector_sub(array[1,1,1,1]::real[], array[2,2,2,2]::real[]);
-- Vector multiplication
SELECT vector_mul(array[1,1,1,1]::real[], array[2,2,2,2]::real[]);
-- Vector division
SELECT vector_div(array[1,1,1,1]::real[], array[2,2,2,2]::real[]);
-- Element sum
SELECT vector_sum(array[1,1,1,1]::real[]);
-- Minimum element
SELECT vector_min(array[1,1,1,1]::real[]);
-- Maximum element
SELECT vector_max(array[1,1,1,1]::real[]);
-- Average element
SELECT vector_avg(array[1,1,1,1]::real[]);
-- Vector norm
SELECT vector_norm(array[1,1,1,1]::real[]);
-- Angle between vectors
SELECT vector_angle(array[1,1,1,1]::real[], array[2,2,2,2]::real[]);

Supported operators

All distance operators use the ANN access method and trigger index-accelerated search when a matching vector index exists. If no matching index exists, the query falls back to an exact search.

Operator	Calculation	Sorting behavior	Equivalent function	Supported data types
`<->`	Squared Euclidean distance	Ascending	`l2_squared_distance`	smallint[], float2[], float4[], real[]
`<#>`	Negative inner product	Descending by dot product	`negative_inner_product_distance`	smallint[], float2[], float4[], real[]
`<=>`	Cosine distance	Descending by cosine similarity	`cosine_distance`	smallint[], float2[], float4[], real[]
`+`	Vector sum	—	—	smallint[], float2[], float4[], real[]
`-`	Vector difference	—	—	smallint[], float2[], float4[], real[]
`*`	Element-wise product	—	—	smallint[], float2[], float4[], real[]
`/`	Element-wise quotient	—	—	smallint[], float2[], float4[], real[]

Important

The +, -, *, and / operators are supported only in v6.6.1.0 and later. To check your version, see View the minor engine version.

Examples:

-- Euclidean distance
SELECT array[1,1,1,1]::real[] <-> array[2,2,2,2]::real[] AS score;
-- Inner product distance
SELECT array[1,1,1,1]::real[] <#> array[2,2,2,2]::real[] AS score;
-- Cosine distance
SELECT array[1,1,1,1]::real[] <=> array[2,2,2,2]::real[] AS score;

-- Addition
SELECT array[1,1,1,1]::real[] + array[2,2,2,2]::real[] AS value;
-- Subtraction
SELECT array[1,1,1,1]::real[] - array[2,2,2,2]::real[] AS value;
-- Multiplication
SELECT array[1,1,1,1]::real[] * array[2,2,2,2]::real[] AS value;
-- Division
SELECT array[1,1,1,1]::real[] / array[2,2,2,2]::real[] AS value;

What's next

Vector search — Run ANN queries against your index
Upgrade versions — Check and upgrade the engine version
Enable or disable vector search engine optimization — Manage the FastANN plugin