Perform vector search using exact and approximate methods - AnalyticDB

AnalyticDB for PostgreSQL supports two vector search methods: exact search and approximate index search. Exact search compares every vector and guarantees 100% recall. Approximate index search uses Hierarchical Navigable Small World (HNSW) indexes to return results faster at a recall rate above 99%.

Pagination on vector search results is not supported. The OFFSET parameter and cursor variables cannot be used.

Choose a search method

Method	How it works	Recall rate	Use when
Exact search	Brute-force comparison of all vectors	100%	Your dataset is small, you require perfect recall, or query volume is low
Approximate index search	HNSW index scan	>99%	You need fast results and can accept a small margin of approximation

Operator quick reference

Each distance metric has a dedicated operator. The operator in your query must match the distance metric used when the index was created.

Operator	Distance metric	Score retrieval formula
`<->`	Euclidean distance (L2)	`sqrt(raw_score)`
`<#>`	Inner product distance	`-1 * raw_score`
`<=>`	Cosine similarity	`1.0 - raw_score`

<#> returns the negative inner product because PostgreSQL only supports ascending index scans. Apply -1 * raw_score to recover the actual inner product value. <=> returns cosine distance; apply 1.0 - raw_score to get cosine similarity.

Exact search

Exact search scans every row in the table and ranks results by the chosen distance metric. Because it compares all vectors, response speed is lower than approximate index search. Use it when recall accuracy is critical.

Three distance metrics are supported:

Euclidean distance — results sorted in ascending order by distance (smaller = more similar)
Inner product distance — results sorted in descending order (larger = more similar)
Cosine similarity — results sorted in descending order (larger = more similar)

-- Euclidean distance
SELECT id, l2_distance(<VECTOR_COLUMN_NAME>, array[1,2,3...N]::float4[]) AS score
FROM <TABLE_NAME>
ORDER BY l2_squared_distance(<VECTOR_COLUMN_NAME>, array[1,2,3...N]::float4[])
LIMIT <TOPK>;

-- Inner product distance
SELECT id, inner_product_distance(<VECTOR_COLUMN_NAME>, array[1,2,3...N]::float4[]) AS score
FROM <TABLE_NAME>
ORDER BY negative_inner_product_distance(<VECTOR_COLUMN_NAME>, array[1,2,3...N]::float4[])
LIMIT <TOPK>;

-- Cosine similarity
SELECT id, cosine_similarity(<VECTOR_COLUMN_NAME>, array[1,2,3...N]::float4[]) AS score
FROM <TABLE_NAME>
ORDER BY cosine_distance(<VECTOR_COLUMN_NAME>, array[1,2,3...N]::float4[])
LIMIT <TOPK>;

Replace these placeholders with actual values:

Placeholder	Description
`<VECTOR_COLUMN_NAME>`	Name of the vector column
`<TABLE_NAME>`	Name of the table
`<TOPK>`	Number of top results to return

Approximate index search

Approximate index search uses HNSW indexes to scan a subset of vectors rather than the full table. Results are returned faster than exact search at a recall rate above 99%.

Inner product distance and cosine similarity are supported only on instances running version V6.3.10.18 or later. To check or update your instance version, see Update the minor version of an instance.

-- Euclidean distance
SELECT id, l2_distance(<VECTOR_COLUMN_NAME>, array[1,2,3...N]::float4[]) AS score
FROM <TABLE_NAME>
ORDER BY <VECTOR_COLUMN_NAME> <-> array[1,2,3...N]::float4[]
LIMIT <TOPK>;

-- Inner product distance
SELECT id, inner_product_distance(<VECTOR_COLUMN_NAME>, array[1,2,3...N]::float4[]) AS score
FROM <TABLE_NAME>
ORDER BY <VECTOR_COLUMN_NAME> <#> array[1,2,3...N]::float4[]
LIMIT <TOPK>;

-- Cosine similarity
SELECT id, cosine_similarity(<VECTOR_COLUMN_NAME>, array[1,2,3...N]::float4[]) AS score
FROM <TABLE_NAME>
ORDER BY <VECTOR_COLUMN_NAME> <=> array[1,2,3...N]::float4[]
LIMIT <TOPK>;

Usage notes:

If no HNSW index exists or no FastANN parameters are configured, the query falls back to exact search automatically.
The ORDER BY clause must sort in ascending order (ASC) or omit the direction entirely. To sort results in descending order, or to apply a secondary sort by another column, wrap the vector search as a subquery and add an outer ORDER BY.
The operator (<->, <#>, <=>) must match the distance metric used when the index was created. A mismatch prevents the index from being used. For details, see Create a vector index.

Verify index usage with EXPLAIN

Run EXPLAIN to confirm your query uses the vector index rather than a full table scan. Ann Index Scan in the output confirms the HNSW index is active.

EXPLAIN
SELECT id, chunk, intime, url
FROM chunks
ORDER BY feature <=> array[10,2.0,...,1536.0]::real[]
LIMIT 100;

Expected output when the index is active:

QUERY PLAN
-------------------------------------------------
 Limit
   ->  Gather Motion 3:1  (slice1; segments: 3)
         Merge Key: ((feature <=> $0))
          ->  Limit
               ->  Ann Index Scan using feature_idx on chunks
                Order By: (feature <=> $0)
 Optimizer: Postgres query optimizer

If you see Seq Scan instead of Ann Index Scan, the query is not using the vector index. Check that the operator matches the index's distance metric and that the index exists on the column.

Filter by metadata (hybrid search)

Combine vector similarity search with a metadata filter in the WHERE clause. The following example retrieves the top 100 most similar documents published within a specific date range:

SELECT id, chunk, intime, url
FROM chunks
WHERE intime > '2023-04-01' AND intime <= '2023-05-01'
ORDER BY feature <=> array[10,2.0,...,1536.0]::real[]
LIMIT 100;

SQL optimization

Match operators to indexes

A single vector column can have multiple indexes for different distance metrics. Each query operator works only with the index created using the corresponding distance metric. Mismatched operator-index pairs prevent index acceleration.

-- Create three indexes on the same column, one per distance metric
CREATE TABLE test_table (
    id serial primary key,
    feature real[]
) distributed by (id);

CREATE INDEX idx_test_table_feature_l2     ON test_table USING ann(feature) WITH (dim=768, distancemeasure=l2,     hnsw_m=64, pq_enable=1);
CREATE INDEX idx_test_table_feature_ip     ON test_table USING ann(feature) WITH (dim=768, distancemeasure=ip,     hnsw_m=64, pq_enable=1);
CREATE INDEX idx_test_table_feature_cosine ON test_table USING ann(feature) WITH (dim=768, distancemeasure=cosine, hnsw_m=64, pq_enable=1);

Return results without a score

Use the <->, <#>, or <=> operators directly in the ORDER BY clause when you only need ranked IDs:

-- Euclidean distance
SELECT id
FROM test_table
ORDER BY feature <-> array[1,2,3...768]::real[]
LIMIT topk;

-- Inner product distance
SELECT id
FROM test_table
ORDER BY feature <#> array[1,2,3...768]::real[]
LIMIT topk;

-- Cosine similarity
SELECT id
FROM test_table
ORDER BY feature <=> array[1,2,3...768]::real[]
LIMIT topk;

Return results with a score

The index sort produces a raw distance value that differs from the actual metric value. Use a subquery to retrieve the raw score, then apply the conversion formula. This avoids a full distance recalculation.

-- Euclidean distance: raw score is squared L2; apply sqrt() to get L2 distance
SELECT t.id, sqrt(t.score) AS score
FROM (
    SELECT id, feature <-> array[1,2,3...768]::real[] AS score
    FROM test_table
    ORDER BY score LIMIT topk
) t;

-- Inner product distance: raw score is negative; multiply by -1 to get inner product
SELECT t.id, (-1 * t.score) AS score
FROM (
    SELECT id, feature <#> array[1,2,3...768]::real[] AS score
    FROM test_table
    ORDER BY score LIMIT topk
) t;

-- Cosine similarity: raw score is cosine distance; apply 1.0 - score to get similarity
SELECT t.id, (1.0 - t.score) AS score
FROM (
    SELECT id, feature <=> array[1,2,3...768]::real[] AS score
    FROM test_table
    ORDER BY score LIMIT topk
) t;

Filter by score range

Apply a WHERE filter on the converted score after the subquery sort and LIMIT. This uses the index for initial retrieval and computes the final score only for the top candidates.

-- Euclidean distance: return results with L2 distance below 100
SELECT t.id, sqrt(t.score) AS score
FROM (
    SELECT id, feature <-> array[1,2,3...768]::real[] AS score
    FROM test_table
    ORDER BY score LIMIT topk
) t
WHERE score < 100;

-- Inner product distance: return results with inner product above 10
SELECT t.id, (-1 * t.score) AS score
FROM (
    SELECT id, feature <#> array[1,2,3...768]::real[] AS score
    FROM test_table
    ORDER BY score LIMIT topk
) t
WHERE score > 10;

-- Cosine similarity: return results with similarity above 0.5
SELECT t.id, (1.0 - t.score) AS score
FROM (
    SELECT id, feature <=> array[1,2,3...768]::real[] AS score
    FROM test_table
    ORDER BY score LIMIT topk
) t
WHERE score > 0.5;

Engine parameters

Configure these session-level parameters to tune the trade-off between search speed and recall rate.

Parameter	Default	Valid range	Description
`fastann.build_parallel_processes`	4	[1, 64]	Number of parallel processes for building HNSW indexes. The effective range depends on your instance specifications.
`fastann.pq_amp`	10	[1, 1,000]	Amplification factor for result sets when product quantization (PQ) is used to reduce high-dimensional vector dimensions. Increase this value to improve recall rate at the cost of speed.
`fastann.hnsw_max_scan_points`	6,000	[1, 6,000,000]	Maximum number of scan points during an HNSW search. Lower values end the search earlier, trading recall for speed.
`fastann.hnsw_ef_search`	400	[10, 10,000]	Size of the candidate set during HNSW search. Higher values increase recall but reduce search speed.

All parameters are session-scoped. Set them before running a query:

SET fastann.hnsw_ef_search = 800;
SET fastann.hnsw_max_scan_points = 12000;

Tuning guidance:

To improve recall rate at the cost of speed, increase fastann.hnsw_ef_search and fastann.hnsw_max_scan_points.
To reduce latency at the cost of recall, decrease these values.
Start with the defaults, measure your recall rate using a test query set, then adjust incrementally.