All Products
Search
Document Center

PolarDB:High-dimensional vector retrieval (PASE)

Last Updated:Mar 30, 2026

PASE (PostgreSQL ANN search extension) is a high-performance vector similarity search plug-in for PolarDB for PostgreSQL. It implements two approximate nearest neighbor (ANN) algorithms — IVFFlat and Hierarchical Navigable Small World (HNSW) — so you can query high-dimensional vectors at speed directly from your database.

PASE does not extract or output feature vectors. Retrieve the feature vectors for your target entities first, then use PASE to run similarity searches across large vector datasets.

Prerequisites

Before you begin, ensure that you have:

  • A PolarDB for PostgreSQL cluster

  • A privileged account (required to run the SQL statements in this topic)

  • Basic familiarity with machine learning concepts and vector search

Usage notes

  • Index bloat: Run select pg_relation_size('index_name'); and compare the result with the table size. If the index is larger than the table and queries have slowed down, rebuild the index.

  • Index drift after updates: Frequent data updates can reduce index accuracy. Rebuild the index regularly if you require 100% recall.

  • IVFFlat internal clustering: To create an IVFFlat index with internal centroids, set clustering_type = 1 and insert data into the table before creating the index.

  • Multi-node elastic parallel queries: Only sequential search is supported for high-dimensional vectors in multi-node elastic parallel queries.

Choose an algorithm

PASE supports two ANN algorithms. The right choice depends on your dataset size, latency target, and recall requirements.

IVFFlat

HNSW

Best for

High-precision use cases (e.g., image comparison)

Large datasets with low-latency requirements

Query latency target

Up to 100 ms

Up to 10 ms

Dataset size

Any size

Tens of millions of vectors or more

100% recall

Yes, when the query vector is in the candidate dataset

No; precision plateaus and cannot be increased further by tuning

Build time

Fast

Slower

Storage overhead

Low

Higher (proximity graph neighbors stored)

Precision control

Fully controllable via parameters

Limited after a threshold

IVFFlat

IVFFlat is a simplified version of the IVFADC algorithm. It clusters vectors using k-means, then searches only the clusters nearest to the query vector — skipping distant clusters to speed up the search. Precision scales with the number of clusters searched, so you can tune it directly.

How IVFFlat works:

How IVFFlat works

  1. IVFFlat applies k-means clustering to group vectors into clusters, each with a centroid.

  2. It identifies the n centroids nearest to the query vector.

  3. It traverses and ranks all vectors in those n clusters and returns the nearest k vectors.

A larger value of n increases precision but also increases compute. Unlike IVFADC, which uses product quantization in phase 2 to reduce traversal cost (at the expense of precision), IVFFlat uses brute-force search — giving you direct control over the accuracy/performance tradeoff.

HNSW

HNSW builds a hierarchical multi-layer graph using the Navigable Small World (NSW) algorithm. Each layer acts as a panoramic skip list over the layer below, enabling fast traversal across large datasets.

How HNSW works:

How HNSW works

  1. HNSW builds a structure with multiple layers (graphs). Each layer is a panorama and skip list of its lower layer.

  2. Starting from a random element in the top layer, HNSW identifies neighbors and adds them to a fixed-length dynamic list ranked by distance.

  3. It continues expanding neighbors, re-sorting the list after each addition and retaining only the top k. Once the list stabilizes, HNSW descends to the next layer using the top element as the starting point.

  4. It repeats until it completes a search in the bottom layer.

After precision reaches a certain level, you cannot increase it further by reconfiguring parameters. HNSW also requires additional storage to persist proximity graph neighbors.

Enable PASE

Run the following statement to install the PASE extension:

CREATE EXTENSION pase;

Calculate vector similarity

Before creating an index, you can calculate vector similarity directly using the <?> operator. PASE supports two construction methods.

The left vector must use the float4[] type and the right vector must use the PASE type. Both vectors must have the same number of dimensions, or the operation returns a similarity calculation error.

PASE-type construction

The <?> operator computes the similarity between vectors. The PASE data type accepts up to three parameters:

  • Parameter 1: The right-side vector (float4[])

  • Parameter 2: Reserved; set to 0

  • Parameter 3: Similarity method — 0 for Euclidean distance, 1 for dot product (inner product)

SELECT ARRAY[2, 1, 1]::float4[] <?> pase(ARRAY[3, 1, 1]::float4[]) AS distance;
SELECT ARRAY[2, 1, 1]::float4[] <?> pase(ARRAY[3, 1, 1]::float4[], 0) AS distance;
SELECT ARRAY[2, 1, 1]::float4[] <?> pase(ARRAY[3, 1, 1]::float4[], 0, 1) AS distance;

String-based construction

String-based construction uses colons (:) to separate parameters instead of function arguments:

SELECT ARRAY[2, 1, 1]::float4[] <?> '3,1,1'::pase AS distance;
SELECT ARRAY[2, 1, 1]::float4[] <?> '3,1,1:0'::pase AS distance;
SELECT ARRAY[2, 1, 1]::float4[] <?> '3,1,1:0:1'::pase AS distance;

In '3,1,1:0:1', the first segment is the vector, the second is the reserved parameter (0), and the third is the similarity method (0 = Euclidean, 1 = dot product).

Normalization requirement

  • Euclidean distance: No normalization needed.

  • Dot product or cosine: Normalize the original vector first. For a normalized vector, the dot product equals the cosine value. For example, if the original vector is , it must satisfy: .

Create an index

IVFFlat index

CREATE INDEX ivfflat_idx ON vectors_table
USING
  pase_ivfflat(vector)
WITH
  (clustering_type = 1, distance_type = 0, dimension = 256, base64_encoded = 0, clustering_params = "10,100");

Parameters

Parameter

Required

Default

Description

clustering_type

Yes

Clustering mode. 0 = external clustering (load a centroid file specified by clustering_params). 1 = internal k-means clustering. For first-time use, start with internal clustering (1).

distance_type

No

0

Similarity method. 0 = Euclidean distance. 1 = dot product (requires normalization; dot product order is opposite to Euclidean distance). PolarDB natively supports Euclidean distance; for dot products, normalize vectors first.

dimension

Yes

Number of dimensions. Maximum: 512.

base64_encoded

No

0

Vector encoding. 0 = float4[]. 1 = Base64-encoded float[].

clustering_params

Yes

For external clustering: the directory of the centroid file. For internal clustering: clustering_sample_ratio,k. See below.

`clustering_params` for internal clustering (`clustering_type = 1`)

Format: clustering_sample_ratio,k

  • clustering_sample_ratio: Sampling fraction with 1,000 as the denominator. Range: (0, 1000]. For example, 1 means a 1/1000 sampling ratio. A higher value increases accuracy but slows index creation. Keep total sampled records under 100,000.

  • k: Number of centroids. A higher value increases accuracy but slows index creation. The recommended range is [100, 1000].

HNSW index

CREATE INDEX hnsw_idx ON vectors_table
USING
  pase_hnsw(vector)
WITH
  (dim = 256, base_nb_num = 16, ef_build = 40, ef_search = 200, base64_encoded = 0);

Parameters

Parameter

Required

Default

Description

dim

Yes

Number of dimensions. Maximum: 512.

base_nb_num

Yes

Number of neighbors per element. A higher value increases accuracy but slows index creation and uses more storage. Recommended range: [16, 128].

ef_build

Yes

Heap length during index creation. A longer heap increases accuracy but slows creation. Recommended range: [40, 400].

ef_search

Yes

200

Heap length during queries. A larger value increases accuracy but reduces query performance. Can be overridden at query time. Start at 40 and increase incrementally until the desired accuracy is reached.

base64_encoded

No

0

Vector encoding. 0 = float4[]. 1 = Base64-encoded float[].

Query vectors

Query with an IVFFlat index

The <#> operator is used for IVFFlat index queries. An ORDER BY clause is required for the index to take effect. Results are sorted in ascending order by distance.

SELECT id, vector <#> '1,1,1'::pase AS distance
FROM vectors_ivfflat
ORDER BY
  vector <#> '1,1,1:10:0'::pase
ASC LIMIT 10;

The query string '1,1,1:10:0' contains three parameters separated by colons:

Position

Value

Description

1

1,1,1

Query vector

2

10

Query accuracy — range: (0, 1000]. Higher = more accurate, lower query performance.

3

0

Similarity method: 0 = Euclidean distance, 1 = dot product (requires normalization; dot product order is opposite to Euclidean).

Query with an HNSW index

The <?> operator is used for HNSW index queries. An ORDER BY clause is required for the index to take effect. Results are sorted in ascending order by distance.

SELECT id, vector <?> '1,1,1'::pase AS distance
FROM vectors_ivfflat
ORDER BY
  vector <?> '1,1,1:100:0'::pase
ASC LIMIT 10;

The query string '1,1,1:100:0' contains three parameters separated by colons:

Position

Value

Description

1

1,1,1

Query vector

2

100

Query accuracy — range: (0, ∞). Higher = more accurate, lower query performance. Start at 40 and increase incrementally.

3

0

Similarity method: 0 = Euclidean distance, 1 = dot product (requires normalization; dot product order is opposite to Euclidean).

Appendix

Calculate dot product (inner product)

Because the dot product of a normalized vector equals its cosine value, the following function works for both dot product and cosine similarity searches. It uses an HNSW index.

CREATE OR REPLACE FUNCTION inner_product_search(query_vector text, ef integer, k integer, table_name text) RETURNS TABLE (id integer, uid text, distance float4) AS $$
BEGIN
    RETURN QUERY EXECUTE format('
    select a.id, a.vector <?> pase(ARRAY[%s], %s, 1) AS distance from
    (SELECT id, vector FROM %s ORDER BY vector <?> pase(ARRAY[%s], %s, 0) ASC LIMIT %s) a
    ORDER BY distance DESC;', query_vector, ef,  table_name,  query_vector, ef, k);
END
$$
LANGUAGE plpgsql;

Create an IVFFlat index from an external centroid file

This is an advanced feature. Upload an external centroid file to the specified server directory and reference it in clustering_params. The file format is:

Number of dimensions|Number of centroids|Centroid vector dataset

Example:

3|2|1,1,1,2,2,2

References