All Products
Search
Document Center

AnalyticDB:Create an HNSW vector index

Last Updated:Mar 28, 2026

Vector indexes accelerate approximate nearest neighbor (ANN) searches on large datasets. AnalyticDB for PostgreSQL uses the FastANN vector search engine, which implements the Hierarchical Navigable Small World (HNSW) algorithm on top of PostgreSQL's segmented page storage. FastANN stores only pointers to vector columns in the index, keeping storage overhead low. It also supports product quantization (PQ) to compress high-dimensional vectors and minimize table lookups during inserts and queries.

Prerequisites

Before you begin, ensure that you have:

  • An AnalyticDB for PostgreSQL instance with Vector Search Engine Optimization enabled. When this feature is enabled, the FastANN plugin is automatically installed.

  • A table with a vector column of type real[], float4[], float2[], or smallint[]

If you get the error You must specify an operator class or define a default operator class for the data type, Vector Search Engine Optimization is not enabled. Enable it and retry. For more information, see Enable or disable vector search engine optimization.

Syntax

CREATE INDEX [INDEX_NAME]
ON [SCHEMA_NAME].[TABLE_NAME]
USING ANN(COLUMN_NAME)
WITH (DIM=<DIMENSION>,
      DISTANCEMEASURE=<MEASURE>,
      HNSW_M=<M>,
      HNSW_EF_CONSTRUCTION=<EF_CONSTRUCTION>,
      PQ_ENABLE=<PQ_ENABLE>,
      PQ_SEGMENTS=<PQ_SEGMENTS>,
      PQ_CENTERS=<PQ_CENTERS>,
      EXTERNAL_STORAGE=<EXTERNAL_STORAGE>);

Where:

  • INDEX_NAME: The index name.

  • SCHEMA_NAME: The schema (namespace) name.

  • TABLE_NAME: The table name.

  • COLUMN_NAME: The vector column to index.

The following table describes the index parameters.

ParameterRequiredDefaultRangeDescription
DIMYes0[1, 8192]Vector dimensionality. Inserts that don't match this dimension return an error.
DISTANCEMEASURENoL2L2, IP, COSINEDistance measure for the index. See Choose a distance measure.
HNSW_MNo32[10, 1000]Maximum number of neighbors per node. A higher value improves recall at the cost of more memory and slower index builds.
HNSW_EF_CONSTRUCTIONNo600[40, 4000]Candidate set size during index construction. A higher value improves index quality at the cost of build time.
PQ_ENABLENo00, 1Enables product quantization (PQ) for dimension reduction. Don't enable PQ if the table has fewer than 500,000 rows.
PQ_SEGMENTSNo0[1, 256]Number of k-means segments for PQ. Required only when PQ_ENABLE=1 and DIM is not divisible by 8. DIM must be divisible by PQ_SEGMENTS.
PQ_CENTERSNo2048[256, 8192]Number of k-means centroids for PQ. Takes effect only when PQ_ENABLE=1.
EXTERNAL_STORAGENo00, 1Storage mode for the index: 0 uses segmented page storage (supports PostgreSQL shared_buffers caching, and full delete/update support); 1 uses mmap (from v6.6.2.3, supports mark-for-delete for limited deletes and updates). Supported only in version 6.0; not supported in version 7.0.

Choose a distance measure

Pick the distance measure that matches your data and use case. The query operator you use must match the index's distance measure — mismatches cause the query to fall back to an exact (brute-force) search.

Distance measureParameter valueQuery operatorUse case
Euclidean (L2)L2<->Image similarity search
Inner productIP<#>Substitute for cosine similarity after vector normalization
CosineCOSINE<=>Text similarity search
IP and COSINE require engine version 6.3.10.18 or later. For version requirements, see Upgrade versions.

Best performance tip: Normalize your vectors before import, then use IP as the distance measure. An IP index lets queries also return Euclidean distance and cosine similarity directly. For details, see Vector search.

Create a vector index

The following example uses a text knowledge base: articles are split into chunks, converted to embedding vectors, and stored in a chunks table.

Step 1: Create the table

CREATE TABLE chunks (
    id SERIAL PRIMARY KEY,
    chunk VARCHAR(1024),
    intime TIMESTAMP,
    url VARCHAR(1024),
    feature REAL[]
) DISTRIBUTED BY (id);
ColumnTypeDescription
idserialRow ID
chunkvarchar(1024)Text chunk
intimetimestampImport timestamp
urlvarchar(1024)Source article URL
featurereal[]Embedding vector

Step 2: Set the vector column storage to PLAIN

ALTER TABLE chunks ALTER COLUMN feature SET STORAGE PLAIN;

Setting the storage mode to PLAIN avoids column decompression on each row scan, which reduces overhead and improves query performance.

Step 3: Create the vector index

Create an index that matches the distance measure you plan to use in queries. The index type must match the query operator — a mismatch causes the query to fall back to an exact (brute-force) search.

L2 (Euclidean) distance — use <-> in queries:

CREATE INDEX idx_chunks_feature_l2
ON chunks USING ann(feature)
WITH (dim=1536, distancemeasure=l2, hnsw_m=64, pq_enable=1);

Inner product distance — use <#> in queries:

CREATE INDEX idx_chunks_feature_ip
ON chunks USING ann(feature)
WITH (dim=1536, distancemeasure=ip, hnsw_m=64, pq_enable=1);

Cosine distance — use <=> in queries:

CREATE INDEX idx_chunks_feature_cosine
ON chunks USING ann(feature)
WITH (dim=1536, distancemeasure=cosine, hnsw_m=64, pq_enable=1);
Create vector indexes only as needed. Multiple vector columns per table and multiple indexes per column are supported, but unused indexes add overhead.

Step 4: Index structured columns for hybrid queries

CREATE INDEX ON chunks(intime);

Indexing structured columns like intime improves performance when combining vector similarity with filters.

Index building methods

Two approaches are available for building vector indexes. For large-scale imports, asynchronous building is faster: import all data without a vector index first, then build the index in one pass on the complete dataset.

MethodWhen to useTrade-off
Stream buildingCreate the index before importing dataIndex is built in real time as data arrives; slows import throughput
Asynchronous building (recommended)Import data first, then create the indexFaster overall for large datasets; no index overhead during import

Asynchronous building with parallel processing

AnalyticDB for PostgreSQL supports concurrent index building via the fastann.build_parallel_processes Grand Unified Configuration (GUC) parameter. Set it to match the number of cores on your compute nodes.

Instance typeRecommended value
4-core 16 GB4
8-core 32 GB8
16-core 64 GB16
Important

If the instance is serving live traffic, don't set fastann.build_parallel_processes to the maximum. A lower value leaves capacity for ongoing workloads.

Build with DMS:

On DMS, all SET statements and the CREATE INDEX statement must be on the same line to take effect together.

SET statement_timeout = 0;SET idle_in_transaction_session_timeout = 0;SET fastann.build_parallel_processes = 8;CREATE INDEX ON chunks USING ann(feature) WITH (dim=1536, distancemeasure=cosine, hnsw_m=64, pq_enable=1);

Build with psql:

  1. Save the following to create_index.sql:

    SET statement_timeout = 0;
    SET idle_in_transaction_session_timeout = 0;
    SET fastann.build_parallel_processes = 8;
    CREATE INDEX ON chunks USING ann(feature)
    WITH (dim=1536, distancemeasure=cosine, hnsw_m=64, pq_enable=1);
  2. Run the file:

    psql -h gp-xxxx-master.gpdb.rds.aliyuncs.com -p 5432 -d dbname -U username -f create_index.sql

Vector functions reference

Supported functions

CategoryFunctionReturn typeDescriptionSupported data types
Distancel2_distancedouble precisionEuclidean distance (square root).
Note

In v6.3.10.17 and earlier, returns squared Euclidean distance. In v6.3.10.18 and later, returns the square root value.

smallint[], float2[], float4[], real[]
Distanceinner_product_distancedouble precisionInner product distance. Equals cosine similarity after normalization. Formula: image.pngsmallint[], float2[], float4[], real[]
Distancecosine_similaritydouble precisionCosine similarity. Range: [-1, 1]. Measures directional similarity regardless of vector length. Formula: image.pngsmallint[], float2[], float4[], real[]
Distancedp_distancedouble precisionDot product distance. Identical to inner product distance. Formula: image.pngsmallint[], float2[], float4[], real[]
Distancehm_distanceintegerHamming distance (bitwise).int[]
Sortingl2_squared_distancedouble precisionSquared Euclidean distance. Skips the square root calculation for faster distance-based sorting. Formula: image.pngsmallint[], float2[], float4[], real[]
Sortingnegative_inner_product_distancedouble precisionNegated inner product distance. Used for descending sort by inner product. Formula: image.pngsmallint[], float2[], float4[], real[]
Sortingcosine_distancedouble precisionCosine distance. Range: [0, 2]. Used for descending sort by cosine similarity. Formula: image.pngsmallint[], float2[], float4[], real[]
Arithmeticvector_addsmallint[], float2[], or float4[]Sum of two vector arrays.smallint[], float2[], float4[], real[]
Arithmeticvector_subsmallint[], float2[], or float4[]Difference between two vector arrays.smallint[], float2[], float4[], real[]
Arithmeticvector_mulsmallint[], float2[], or float4[]Element-wise product of two vector arrays.smallint[], float2[], float4[], real[]
Arithmeticvector_divsmallint[], float2[], or float4[]Element-wise quotient of two vector arrays.smallint[], float2[], float4[], real[]
Aggregationvector_sumint or double precisionSum of all elements in a vector.smallint[], int[], float2[], float4[], real[], float8[]
Aggregationvector_minint or double precisionMinimum element value.smallint[], int[], float2[], float4[], real[], float8[]
Aggregationvector_maxint or double precisionMaximum element value.smallint[], int[], float2[], float4[], real[], float8[]
Aggregationvector_avgint or double precisionAverage element value.smallint[], int[], float2[], float4[], real[], float8[]
Aggregationvector_normdouble precisionNorm (length) of a vector.smallint[], int[], float2[], float4[], real[], float8[]
Aggregationvector_angledouble precisionAngle between two vectors.smallint[], float2[], float4[], real[]
Important

vector_add, vector_sub, vector_mul, vector_div, vector_sum, vector_min, vector_max, vector_avg, vector_norm, and vector_angle are supported only in v6.6.1.0 and later. To check your version, see View the minor engine version.

Examples:

-- Euclidean distance
SELECT l2_distance(array[1,1,1,1]::real[], array[2,2,2,2]::real[]);
-- Inner product distance
SELECT inner_product_distance(array[1,1,1,1]::real[], array[2,2,2,2]::real[]);
-- Cosine similarity
SELECT cosine_similarity(array[1,1,1,1]::real[], array[2,2,2,2]::real[]);
-- Dot product distance
SELECT dp_distance(array[1,1,1,1]::real[], array[2,2,2,2]::real[]);
-- Hamming distance
SELECT hm_distance(array[1,0,1,0]::int[], array[0,1,0,1]::int[]);
-- Vector addition
SELECT vector_add(array[1,1,1,1]::real[], array[2,2,2,2]::real[]);
-- Vector subtraction
SELECT vector_sub(array[1,1,1,1]::real[], array[2,2,2,2]::real[]);
-- Vector multiplication
SELECT vector_mul(array[1,1,1,1]::real[], array[2,2,2,2]::real[]);
-- Vector division
SELECT vector_div(array[1,1,1,1]::real[], array[2,2,2,2]::real[]);
-- Element sum
SELECT vector_sum(array[1,1,1,1]::real[]);
-- Minimum element
SELECT vector_min(array[1,1,1,1]::real[]);
-- Maximum element
SELECT vector_max(array[1,1,1,1]::real[]);
-- Average element
SELECT vector_avg(array[1,1,1,1]::real[]);
-- Vector norm
SELECT vector_norm(array[1,1,1,1]::real[]);
-- Angle between vectors
SELECT vector_angle(array[1,1,1,1]::real[], array[2,2,2,2]::real[]);

Supported operators

All distance operators use the ANN access method and trigger index-accelerated search when a matching vector index exists. If no matching index exists, the query falls back to an exact search.

OperatorCalculationSorting behaviorEquivalent functionSupported data types
<->Squared Euclidean distanceAscendingl2_squared_distancesmallint[], float2[], float4[], real[]
<#>Negative inner productDescending by dot productnegative_inner_product_distancesmallint[], float2[], float4[], real[]
<=>Cosine distanceDescending by cosine similaritycosine_distancesmallint[], float2[], float4[], real[]
+Vector sumsmallint[], float2[], float4[], real[]
-Vector differencesmallint[], float2[], float4[], real[]
*Element-wise productsmallint[], float2[], float4[], real[]
/Element-wise quotientsmallint[], float2[], float4[], real[]
Important

The +, -, *, and / operators are supported only in v6.6.1.0 and later. To check your version, see View the minor engine version.

Examples:

-- Euclidean distance
SELECT array[1,1,1,1]::real[] <-> array[2,2,2,2]::real[] AS score;
-- Inner product distance
SELECT array[1,1,1,1]::real[] <#> array[2,2,2,2]::real[] AS score;
-- Cosine distance
SELECT array[1,1,1,1]::real[] <=> array[2,2,2,2]::real[] AS score;

-- Addition
SELECT array[1,1,1,1]::real[] + array[2,2,2,2]::real[] AS value;
-- Subtraction
SELECT array[1,1,1,1]::real[] - array[2,2,2,2]::real[] AS value;
-- Multiplication
SELECT array[1,1,1,1]::real[] * array[2,2,2,2]::real[] AS value;
-- Division
SELECT array[1,1,1,1]::real[] / array[2,2,2,2]::real[] AS value;

What's next