PASE (PostgreSQL Approximate Nearest Neighbor Search Extension) adds high-performance approximate nearest neighbor (ANN) vector search to PolarDB for PostgreSQL (Compatible with Oracle). It supports two index types — IVFFlat and Hierarchical Navigable Small World (HNSW) — so you can balance build time, memory, and query speed based on your workload.
PASE does not extract or generate feature vectors. You must supply the feature vectors yourself. PASE handles the similarity search.
Prerequisites
Before you begin, make sure you have:
A PolarDB for PostgreSQL (Compatible with Oracle) cluster
A privileged database account to run the SQL statements in this topic
A table with a
float4[]column that stores your vectors
Quick start
Install the extension, create a table, build an index, and run your first similarity query in four steps.
-- Step 1: Install PASE
CREATE EXTENSION pase;
-- Step 2: Create a table with a vector column
CREATE TABLE items (id integer, vector float4[]);
INSERT INTO items VALUES
(1, ARRAY[1.0, 2.0, 3.0]::float4[]),
(2, ARRAY[4.0, 5.0, 6.0]::float4[]),
(3, ARRAY[7.0, 8.0, 9.0]::float4[]);
-- Step 3: Build an HNSW index (recommended for most workloads)
CREATE INDEX items_hnsw_idx ON items
USING pase_hnsw(vector)
WITH (dim = 3, base_nb_num = 16, ef_build = 40, ef_search = 200, base64_encoded = 0);
-- Step 4: Query the top 3 nearest neighbors
-- ORDER BY is required for the index to take effect
SELECT id, vector <?> '1,2,3:100:0'::pase AS distance
FROM items
ORDER BY vector <?> '1,2,3:100:0'::pase
ASC LIMIT 3;Choose an index type
| IVFFlat | HNSW | |
|---|---|---|
| Build time | Fast | Slow |
| Memory | Low | High (stores proximity graph) |
| Query speed vs. recall | Lower speed-recall ratio | Higher speed-recall ratio |
| Precision ceiling | Tunable — increase the number of probed clusters to raise recall | Fixed — cannot be raised by reconfiguring parameters |
| Best for | High-precision image comparison; datasets where 100% recall is achievable | Large datasets (tens of millions of vectors); sub-10ms latency requirements |
When to use IVFFlat: Your dataset fits in memory, you can tolerate up to 100 ms per query, and you need 100% recall (possible when the query vector is in the candidate set).
When to use HNSW: Your dataset contains tens of millions of vectors, you need responses under 10 ms, and a fixed precision ceiling is acceptable.
How it works
IVFFlat
IVFFlat is a simplified version of the IVFADC algorithm. It partitions vectors into clusters using k-means, then searches only the clusters nearest to the query vector.

k-means divides all vectors into clusters. Each cluster has a centroid.
IVFFlat finds the *n* centroids nearest to the query vector.
It searches all vectors in those *n* clusters and returns the top *k* nearest results.
The parameter *n* controls the speed-recall trade-off: a larger *n* searches more clusters and improves recall, but increases compute time. Clusters far from the query vector are skipped entirely.
Unlike IVFADC, which uses product quantization to compress the second-phase search, IVFFlat uses brute-force search within selected clusters — preserving precision at the cost of slightly more compute.
HNSW
HNSW is a graph-based ANN algorithm built on the Navigable Small World (NSW) algorithm. It constructs a multi-layer graph where each layer is a coarser view of the layer below.

HNSW builds a hierarchical graph structure across multiple layers.
A search starts from a random entry point in the top layer.
At each layer, HNSW finds the neighbors of the current element, adds them to a fixed-length dynamic list sorted by distance, and keeps only the top *k* candidates. It repeats until the list stabilizes, then descends to the next layer using the best candidate as the entry point.
The process repeats until the bottom layer is fully searched.
Build an index
IVFFlat index
CREATE INDEX ivfflat_idx ON vectors_table
USING pase_ivfflat(vector)
WITH (
clustering_type = 1,
distance_type = 0,
dimension = 256,
base64_encoded = 0,
clustering_params = "10,100"
);Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
clustering_type | Yes | — | Clustering method. 0 = external (load a centroid file specified by clustering_params). 1 = internal k-means. Start with 1 if this is your first time using PASE. |
distance_type | No | 0 | Distance method. 0 = Euclidean distance. 1 = dot product (requires vector normalization; note that dot product order is opposite to Euclidean order). PolarDB supports only Euclidean distance natively; see Calculate dot products for the dot product approach. |
dimension | Yes | — | Number of vector dimensions. Maximum: 512. |
base64_encoded | No | 0 | Vector encoding. 0 = float4[]. 1 = Base64-encoded float[]. |
clustering_params | Yes | — | For internal clustering (clustering_type = 1), format: clustering_sample_ratio,k. clustering_sample_ratio is an integer in (0, 1000] representing the sampling fraction with 1,000 as the denominator. k is the number of centroids, with a recommended range of [100, 1000]. Keep the total number of sampled records under 100,000. For external clustering (clustering_type = 0), specify the path to the centroid file. |
To use internal clustering (clustering_type = 1), insert data into the table before creating the index. The k-means step requires existing rows.
HNSW index
CREATE INDEX hnsw_idx ON vectors_table
USING pase_hnsw(vector)
WITH (
dim = 256,
base_nb_num = 16,
ef_build = 40,
ef_search = 200,
base64_encoded = 0
);Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
dim | Yes | — | Number of vector dimensions. Maximum: 512. |
base_nb_num | Yes | — | Number of neighbors per element. Use a value in [16, 128]. Higher values improve recall but increase build time and index size. |
ef_build | Yes | — | Heap size during index construction. Use a value in [40, 400]. Higher values improve recall but slow down index creation. |
ef_search | Yes | 200 | Heap size during query. Higher values improve recall at the cost of query speed. Override this per query by changing the value in the query string. |
base64_encoded | No | 0 | Vector encoding. 0 = float4[]. 1 = Base64-encoded float[]. |
Query vectors
All query examples use ORDER BY with ASC and LIMIT. Both are required — omitting ORDER BY bypasses the index entirely.
IVFFlat query
The <#> operator activates the IVFFlat index.
SELECT id, vector <#> '1,1,1'::pase AS distance
FROM vectors_ivfflat
ORDER BY vector <#> '1,1,1:10:0'::pase
ASC LIMIT 10;The query string format is vector_values:n:distance_method:
| Field | Description |
|---|---|
vector_values | Comma-separated query vector |
n | Number of clusters to search. Range: (0, 1000]. Higher values improve recall at the cost of speed. |
distance_method | 0 = Euclidean distance. 1 = dot product (requires normalized vectors). |
HNSW query
The <?> operator activates the HNSW index.
SELECT id, vector <?> '1,1,1'::pase AS distance
FROM vectors_ivfflat
ORDER BY vector <?> '1,1,1:100:0'::pase
ASC LIMIT 10;The query string format is vector_values:ef:distance_method:
| Field | Description |
|---|---|
vector_values | Comma-separated query vector |
ef | Search heap size. Range: (0, ∞). Start at 40 and increase in small steps until recall meets your requirements. |
distance_method | 0 = Euclidean distance. 1 = dot product (requires normalized vectors). |
Calculate vector similarity
Before creating an index, use the <?> operator and the pase() constructor to calculate the similarity between two vectors directly.
PASE-type construction — pass the right vector as a pase() function call:
-- Euclidean distance (default)
SELECT ARRAY[2, 1, 1]::float4[] <?> pase(ARRAY[3, 1, 1]::float4[]) AS distance;
-- Explicitly set the distance method to Euclidean (0)
SELECT ARRAY[2, 1, 1]::float4[] <?> pase(ARRAY[3, 1, 1]::float4[], 0) AS distance;
-- Use dot product (1) -- normalize vectors first
SELECT ARRAY[2, 1, 1]::float4[] <?> pase(ARRAY[3, 1, 1]::float4[], 0, 1) AS distance;String construction — pass the right vector as a cast string literal, with parameters separated by colons:
-- Euclidean distance (default)
SELECT ARRAY[2, 1, 1]::float4[] <?> '3,1,1'::pase AS distance;
-- Explicitly set distance method to Euclidean (0)
SELECT ARRAY[2, 1, 1]::float4[] <?> '3,1,1:0'::pase AS distance;
-- Use dot product (1) -- normalize vectors first
SELECT ARRAY[2, 1, 1]::float4[] <?> '3,1,1:0:1'::pase AS distance;Constructor parameters (both methods):
| Position | Description |
|---|---|
| 1st | The query vector |
| 2nd | Reserved — set to 0 |
| 3rd | Distance method: 0 = Euclidean, 1 = dot product |
The left vector must usefloat4[]. The right vector must use thepasetype. Both vectors must have the same number of dimensions.
For Euclidean distance, use vectors as-is. For dot product or cosine similarity, normalize each vector so that it satisfiesand
. When the vector is normalized, the dot product equals the cosine similarity.
Operational considerations
Index bloat: Index files can grow bloated after many writes. Check index size with SELECT pg_relation_size('index_name'). If the index is larger than the underlying table data and queries have slowed, rebuild the index.
Accuracy after updates: Frequent inserts and deletes can reduce recall. For workloads that require 100% accuracy, rebuild the index periodically.
Parallel query limit: Only sequential search is supported for high-dimensional vectors in multi-node elastic parallel queries.
Calculate dot products
PolarDB supports Euclidean distance natively in its index implementation. To use dot product similarity, normalize your vectors before indexing and use the following function, which wraps an HNSW query with distance_method = 1:
CREATE OR REPLACE FUNCTION inner_product_search(
query_vector text,
ef integer,
k integer,
table_name text
)
RETURNS TABLE (id integer, uid text, distance float4) AS $$
BEGIN
RETURN QUERY EXECUTE format('
SELECT a.id, a.vector <?> pase(ARRAY[%s], %s, 1) AS distance
FROM (
SELECT id, vector
FROM %s
ORDER BY vector <?> pase(ARRAY[%s], %s, 0) ASC
LIMIT %s
) a
ORDER BY distance DESC;',
query_vector, ef, table_name, query_vector, ef, k
);
END
$$ LANGUAGE plpgsql;The dot product of a normalized vector equals its cosine similarity, so this function also serves as a cosine similarity search.
Create an IVFFlat index from an external centroid file
This is an advanced option. Upload a centroid file to the server and reference it in clustering_params when clustering_type = 0.
The centroid file format:
{dimensions}|{number_of_centroids}|{centroid_vector_values}Example — 2 centroids, 3 dimensions each:
3|2|1,1,1,2,2,2For parameter details, see the IVFFlat index parameters table.
References
Herve Jégou, Matthijs Douze, Cordelia Schmid. Product quantization for nearest neighbor search.
Yu. A. Malkov, D. A. Yashunin. Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs.