All Products
Search
Document Center

PolarDB:PASE

Last Updated:Mar 28, 2026

PASE (PostgreSQL Approximate Nearest Neighbor Search Extension) adds high-performance approximate nearest neighbor (ANN) vector search to PolarDB for PostgreSQL (Compatible with Oracle). It supports two index types — IVFFlat and Hierarchical Navigable Small World (HNSW) — so you can balance build time, memory, and query speed based on your workload.

PASE does not extract or generate feature vectors. You must supply the feature vectors yourself. PASE handles the similarity search.

Prerequisites

Before you begin, make sure you have:

  • A PolarDB for PostgreSQL (Compatible with Oracle) cluster

  • A privileged database account to run the SQL statements in this topic

  • A table with a float4[] column that stores your vectors

Quick start

Install the extension, create a table, build an index, and run your first similarity query in four steps.

-- Step 1: Install PASE
CREATE EXTENSION pase;

-- Step 2: Create a table with a vector column
CREATE TABLE items (id integer, vector float4[]);
INSERT INTO items VALUES
  (1, ARRAY[1.0, 2.0, 3.0]::float4[]),
  (2, ARRAY[4.0, 5.0, 6.0]::float4[]),
  (3, ARRAY[7.0, 8.0, 9.0]::float4[]);

-- Step 3: Build an HNSW index (recommended for most workloads)
CREATE INDEX items_hnsw_idx ON items
USING pase_hnsw(vector)
WITH (dim = 3, base_nb_num = 16, ef_build = 40, ef_search = 200, base64_encoded = 0);

-- Step 4: Query the top 3 nearest neighbors
-- ORDER BY is required for the index to take effect
SELECT id, vector <?> '1,2,3:100:0'::pase AS distance
FROM items
ORDER BY vector <?> '1,2,3:100:0'::pase
ASC LIMIT 3;

Choose an index type

IVFFlatHNSW
Build timeFastSlow
MemoryLowHigh (stores proximity graph)
Query speed vs. recallLower speed-recall ratioHigher speed-recall ratio
Precision ceilingTunable — increase the number of probed clusters to raise recallFixed — cannot be raised by reconfiguring parameters
Best forHigh-precision image comparison; datasets where 100% recall is achievableLarge datasets (tens of millions of vectors); sub-10ms latency requirements

When to use IVFFlat: Your dataset fits in memory, you can tolerate up to 100 ms per query, and you need 100% recall (possible when the query vector is in the candidate set).

When to use HNSW: Your dataset contains tens of millions of vectors, you need responses under 10 ms, and a fixed precision ceiling is acceptable.

How it works

IVFFlat

IVFFlat is a simplified version of the IVFADC algorithm. It partitions vectors into clusters using k-means, then searches only the clusters nearest to the query vector.

How IVFFlat works
  1. k-means divides all vectors into clusters. Each cluster has a centroid.

  2. IVFFlat finds the *n* centroids nearest to the query vector.

  3. It searches all vectors in those *n* clusters and returns the top *k* nearest results.

The parameter *n* controls the speed-recall trade-off: a larger *n* searches more clusters and improves recall, but increases compute time. Clusters far from the query vector are skipped entirely.

Unlike IVFADC, which uses product quantization to compress the second-phase search, IVFFlat uses brute-force search within selected clusters — preserving precision at the cost of slightly more compute.

HNSW

HNSW is a graph-based ANN algorithm built on the Navigable Small World (NSW) algorithm. It constructs a multi-layer graph where each layer is a coarser view of the layer below.

How HNSW works
  1. HNSW builds a hierarchical graph structure across multiple layers.

  2. A search starts from a random entry point in the top layer.

  3. At each layer, HNSW finds the neighbors of the current element, adds them to a fixed-length dynamic list sorted by distance, and keeps only the top *k* candidates. It repeats until the list stabilizes, then descends to the next layer using the best candidate as the entry point.

  4. The process repeats until the bottom layer is fully searched.

Build an index

IVFFlat index

CREATE INDEX ivfflat_idx ON vectors_table
USING pase_ivfflat(vector)
WITH (
  clustering_type = 1,
  distance_type = 0,
  dimension = 256,
  base64_encoded = 0,
  clustering_params = "10,100"
);

Parameters

ParameterRequiredDefaultDescription
clustering_typeYesClustering method. 0 = external (load a centroid file specified by clustering_params). 1 = internal k-means. Start with 1 if this is your first time using PASE.
distance_typeNo0Distance method. 0 = Euclidean distance. 1 = dot product (requires vector normalization; note that dot product order is opposite to Euclidean order). PolarDB supports only Euclidean distance natively; see Calculate dot products for the dot product approach.
dimensionYesNumber of vector dimensions. Maximum: 512.
base64_encodedNo0Vector encoding. 0 = float4[]. 1 = Base64-encoded float[].
clustering_paramsYesFor internal clustering (clustering_type = 1), format: clustering_sample_ratio,k. clustering_sample_ratio is an integer in (0, 1000] representing the sampling fraction with 1,000 as the denominator. k is the number of centroids, with a recommended range of [100, 1000]. Keep the total number of sampled records under 100,000. For external clustering (clustering_type = 0), specify the path to the centroid file.
Important

To use internal clustering (clustering_type = 1), insert data into the table before creating the index. The k-means step requires existing rows.

HNSW index

CREATE INDEX hnsw_idx ON vectors_table
USING pase_hnsw(vector)
WITH (
  dim = 256,
  base_nb_num = 16,
  ef_build = 40,
  ef_search = 200,
  base64_encoded = 0
);

Parameters

ParameterRequiredDefaultDescription
dimYesNumber of vector dimensions. Maximum: 512.
base_nb_numYesNumber of neighbors per element. Use a value in [16, 128]. Higher values improve recall but increase build time and index size.
ef_buildYesHeap size during index construction. Use a value in [40, 400]. Higher values improve recall but slow down index creation.
ef_searchYes200Heap size during query. Higher values improve recall at the cost of query speed. Override this per query by changing the value in the query string.
base64_encodedNo0Vector encoding. 0 = float4[]. 1 = Base64-encoded float[].

Query vectors

All query examples use ORDER BY with ASC and LIMIT. Both are required — omitting ORDER BY bypasses the index entirely.

IVFFlat query

The <#> operator activates the IVFFlat index.

SELECT id, vector <#> '1,1,1'::pase AS distance
FROM vectors_ivfflat
ORDER BY vector <#> '1,1,1:10:0'::pase
ASC LIMIT 10;

The query string format is vector_values:n:distance_method:

FieldDescription
vector_valuesComma-separated query vector
nNumber of clusters to search. Range: (0, 1000]. Higher values improve recall at the cost of speed.
distance_method0 = Euclidean distance. 1 = dot product (requires normalized vectors).

HNSW query

The <?> operator activates the HNSW index.

SELECT id, vector <?> '1,1,1'::pase AS distance
FROM vectors_ivfflat
ORDER BY vector <?> '1,1,1:100:0'::pase
ASC LIMIT 10;

The query string format is vector_values:ef:distance_method:

FieldDescription
vector_valuesComma-separated query vector
efSearch heap size. Range: (0, ∞). Start at 40 and increase in small steps until recall meets your requirements.
distance_method0 = Euclidean distance. 1 = dot product (requires normalized vectors).

Calculate vector similarity

Before creating an index, use the <?> operator and the pase() constructor to calculate the similarity between two vectors directly.

PASE-type construction — pass the right vector as a pase() function call:

-- Euclidean distance (default)
SELECT ARRAY[2, 1, 1]::float4[] <?> pase(ARRAY[3, 1, 1]::float4[]) AS distance;

-- Explicitly set the distance method to Euclidean (0)
SELECT ARRAY[2, 1, 1]::float4[] <?> pase(ARRAY[3, 1, 1]::float4[], 0) AS distance;

-- Use dot product (1) -- normalize vectors first
SELECT ARRAY[2, 1, 1]::float4[] <?> pase(ARRAY[3, 1, 1]::float4[], 0, 1) AS distance;

String construction — pass the right vector as a cast string literal, with parameters separated by colons:

-- Euclidean distance (default)
SELECT ARRAY[2, 1, 1]::float4[] <?> '3,1,1'::pase AS distance;

-- Explicitly set distance method to Euclidean (0)
SELECT ARRAY[2, 1, 1]::float4[] <?> '3,1,1:0'::pase AS distance;

-- Use dot product (1) -- normalize vectors first
SELECT ARRAY[2, 1, 1]::float4[] <?> '3,1,1:0:1'::pase AS distance;

Constructor parameters (both methods):

PositionDescription
1stThe query vector
2ndReserved — set to 0
3rdDistance method: 0 = Euclidean, 1 = dot product
The left vector must use float4[]. The right vector must use the pase type. Both vectors must have the same number of dimensions.
For Euclidean distance, use vectors as-is. For dot product or cosine similarity, normalize each vector so that it satisfies and . When the vector is normalized, the dot product equals the cosine similarity.

Operational considerations

Index bloat: Index files can grow bloated after many writes. Check index size with SELECT pg_relation_size('index_name'). If the index is larger than the underlying table data and queries have slowed, rebuild the index.

Accuracy after updates: Frequent inserts and deletes can reduce recall. For workloads that require 100% accuracy, rebuild the index periodically.

Parallel query limit: Only sequential search is supported for high-dimensional vectors in multi-node elastic parallel queries.

Calculate dot products

PolarDB supports Euclidean distance natively in its index implementation. To use dot product similarity, normalize your vectors before indexing and use the following function, which wraps an HNSW query with distance_method = 1:

CREATE OR REPLACE FUNCTION inner_product_search(
  query_vector text,
  ef integer,
  k integer,
  table_name text
)
RETURNS TABLE (id integer, uid text, distance float4) AS $$
BEGIN
  RETURN QUERY EXECUTE format('
    SELECT a.id, a.vector <?> pase(ARRAY[%s], %s, 1) AS distance
    FROM (
      SELECT id, vector
      FROM %s
      ORDER BY vector <?> pase(ARRAY[%s], %s, 0) ASC
      LIMIT %s
    ) a
    ORDER BY distance DESC;',
    query_vector, ef, table_name, query_vector, ef, k
  );
END
$$ LANGUAGE plpgsql;

The dot product of a normalized vector equals its cosine similarity, so this function also serves as a cosine similarity search.

Create an IVFFlat index from an external centroid file

This is an advanced option. Upload a centroid file to the server and reference it in clustering_params when clustering_type = 0.

The centroid file format:

{dimensions}|{number_of_centroids}|{centroid_vector_values}

Example — 2 centroids, 3 dimensions each:

3|2|1,1,1,2,2,2

For parameter details, see the IVFFlat index parameters table.

References