If your application already uses pgvector, migrating to AnalyticDB for PostgreSQL requires minimal code changes. AnalyticDB for PostgreSQL is compatible with pgvector SQL syntax and client libraries — the only difference is how you create vector indexes.
How it works
AnalyticDB for PostgreSQL uses its in-house FastANN vector search engine instead of the pgvector extension. When you enable the vector search engine optimization feature, FastANN is installed automatically. You do not need to install pgvector.
The key difference from standard pgvector is index creation syntax. Table structure, data insertion, search operators, vector functions, and client libraries work the same way.
Advantages
| Advantage | Details |
|---|---|
| No extension install | FastANN is pre-installed. No CREATE EXTENSION vector needed. |
| Full SQL compatibility | Reuse pgvector operators (<->, <#>, <=>), functions, and client libraries. |
| Distributed architecture | Scales beyond standalone PostgreSQL for large vector datasets. |
| Better performance | FastANN delivers higher query throughput and lower latency than pgvector. |
| Hybrid search | Query optimizers generate combined vector + scalar filter execution plans. |
| Lower storage costs | FLOAT2 type compression and product quantization (PQ) index compression reduce storage usage. |
Prerequisites
Before you begin, ensure that you have:
An AnalyticDB for PostgreSQL V6.0 instance with minor version 6.6.1.0 or later. For instructions, see View the minor version of an instance.
The vector search engine optimization feature enabled on the instance. For instructions, see Enable or disable vector search engine optimization.
Create a vector table
Use the VECTOR type instead of array types (SMALLINT[], FLOAT2[], FLOAT4[]) for vector columns. Each table supports multiple vector columns.
CREATE TABLE [TABLE_NAME]
(
C1 DATATYPE,
C2 DATATYPE,
......,
CN VECTOR(DIM),
PRIMARY KEY(C1 [,C2,...CN])
) DISTRIBUTED BY (C1);VECTOR(DIM) specifies the vector column and its dimension.
Example: Create a table named FACE_TABLE with two vector columns of different dimensions.
CREATE TABLE FACE_TABLE (
C1 INT,
C2 VECTOR(512) NOT NULL,
C3 VECTOR(1536) NOT NULL,
C4 TIMESTAMP NOT NULL,
C5 VARCHAR(20) NOT NULL,
PRIMARY KEY (C1)
) DISTRIBUTED BY (C1);Create a vector index
AnalyticDB for PostgreSQL uses FastANN's USING ANN syntax to create vector indexes, not pgvector's USING hnsw or USING ivfflat syntax.
CREATE INDEX [INDEX_NAME]
ON [SCHEMA_NAME].[TABLE_NAME]
USING ANN(COLUMN_NAME)
WITH (DISTANCEMEASURE=<MEASURE>,
HNSW_M=<M>,
HNSW_EF_CONSTRUCTION=<EF_CONSTRUCTION>,
PQ_ENABLE=<PQ_ENABLE>,
PQ_SEGMENTS=<PQ_SEGMENTS>,
PQ_CENTERS=<PQ_CENTERS>,
EXTERNAL_STORAGE=<EXTERNAL_STORAGE>);Because the VECTOR type already encodes the dimension, the DIM parameter is not required. For the full parameter reference, see Create a vector index.
Supported distance measures and their search operators:
| Distance measure | DISTANCEMEASURE value | Search operator | Sorts by |
|---|---|---|---|
| Euclidean distance | l2 | <-> | Ascending (smaller = closer) |
| Inner product | ip | <#> | Descending (more negative = higher similarity) |
| Cosine similarity | cosine | <=> | Ascending cosine distance (smaller = more similar) |
Each operator uses the index only when the index was created with the matching distance measure. A mismatch causes the query to fall back to an exact sequential scan.
Example: Create L2, inner product, and cosine indexes on both vector columns of FACE_TABLE.
-- Indexes on C2 (512 dimensions)
CREATE INDEX idx_c2_l2 ON FACE_TABLE USING ann(C2) WITH (distancemeasure=l2, hnsw_m=64, pq_enable=1);
CREATE INDEX idx_c2_ip ON FACE_TABLE USING ann(C2) WITH (distancemeasure=ip, hnsw_m=64, pq_enable=1);
CREATE INDEX idx_c2_cosine ON FACE_TABLE USING ann(C2) WITH (distancemeasure=cosine, hnsw_m=64, pq_enable=1);
-- Indexes on C3 (1536 dimensions)
CREATE INDEX idx_c3_l2 ON FACE_TABLE USING ann(C3) WITH (distancemeasure=l2, hnsw_m=64, pq_enable=1);
CREATE INDEX idx_c3_ip ON FACE_TABLE USING ann(C3) WITH (distancemeasure=ip, hnsw_m=64, pq_enable=1);
CREATE INDEX idx_c3_cosine ON FACE_TABLE USING ann(C3) WITH (distancemeasure=cosine, hnsw_m=64, pq_enable=1);Create indexes based on the distance measures your queries actually use. Unused indexes consume storage without benefit.
Insert vector data
Use the same INSERT syntax as pgvector:
INSERT INTO FACE_TABLE
VALUES (1, '[1,2,3 ... 512]', '[1,2,3 ... 1536]', '2023-12-29 00:00:00', 'aaa.bbb.ccc/face1.jpg');Search vectors
Choose between exact search and approximate search
Choose the search method based on your requirements:
| Factor | Use exact search | Use approximate index search |
|---|---|---|
| Dataset size | Small | Large |
| Recall requirement | 100% recall needed | Small recall trade-off acceptable |
| Query volume | Low queries per second | High queries per second |
Exact search (no index): scans all vectors and returns precise results, but does not scale to large datasets or high query volumes.
Approximate index search: uses the ANN index to retrieve results faster, trading a small reduction in recall for significantly better performance.
Exact search uses vector functions:
-- Euclidean distance
SELECT C1 FROM FACE_TABLE ORDER BY vector_l2_squared_distance(C2, '[1,2,3 ... 512]') LIMIT 10;
-- Inner product
SELECT C1 FROM FACE_TABLE ORDER BY vector_negative_inner_product(C2, '[1,2,3 ... 512]') LIMIT 10;
-- Cosine similarity
SELECT C1 FROM FACE_TABLE ORDER BY cosine_distance(C2, '[1,2,3 ... 512]') LIMIT 10;Approximate index search uses operators:
-- Euclidean distance (<-> uses the l2 index)
SELECT C1 FROM FACE_TABLE ORDER BY C2 <-> '[1,2,3 ... 512]' LIMIT 10;
-- Inner product (<#> uses the ip index)
SELECT C1 FROM FACE_TABLE ORDER BY C2 <#> '[1,2,3 ... 512]' LIMIT 10;
-- Cosine similarity (<=> uses the cosine index)
SELECT C1 FROM FACE_TABLE ORDER BY C2 <=> '[1,2,3 ... 512]' LIMIT 10;For recall rate tuning parameters, see Perform vector search.
Verify that the index is used
To confirm your query uses the vector index rather than falling back to a sequential scan, run EXPLAIN ANALYZE:
EXPLAIN ANALYZE SELECT C1 FROM FACE_TABLE ORDER BY C2 <-> '[1,2,3 ... 512]' LIMIT 10;If the output shows Seq Scan instead of an index scan, the query is not using the vector index. Check that the operator matches the index's distance measure.
Return distance scores
Each operator returns a raw score with a specific mathematical meaning. Use a subquery to sort by the index result first, then convert the score in the outer query:
| Operator | Raw score meaning | Conversion to actual distance |
|---|---|---|
<-> | Squared Euclidean distance | sqrt(score) |
<#> | Negative inner product (returns negative values) | -1 * score |
<=> | Cosine distance (0–2 range) | 1.0 - score gives cosine similarity |
-- Euclidean distance: <-> returns the squared distance, so take the square root
SELECT t.C1, sqrt(t.score) AS score
FROM (
SELECT C1, C2 <-> '[1,2,3 ... 512]' AS score
FROM FACE_TABLE
ORDER BY score LIMIT topk
) t;
-- Inner product: <#> returns the negative inner product, so multiply by -1
SELECT t.C1, (-1 * t.score) AS score
FROM (
SELECT C1, C2 <#> '[1,2,3 ... 512]' AS score
FROM FACE_TABLE
ORDER BY score LIMIT topk
) t;
-- Cosine similarity: <=> returns the cosine distance, so subtract from 1.0
SELECT t.C1, (1.0 - t.score) AS score
FROM (
SELECT C1, C2 <=> '[1,2,3 ... 512]' AS score
FROM FACE_TABLE
ORDER BY score LIMIT topk
) t;Filter by score range
Apply a score threshold in the outer query after sorting by the index result:
-- Euclidean: keep results with distance < 100
SELECT t.C1, sqrt(t.score) AS score
FROM (
SELECT C1, C2 <-> '[1,2,3 ... 512]' AS score
FROM FACE_TABLE
ORDER BY score LIMIT topk
) t
WHERE score < 100;
-- Inner product: keep results with similarity > 10
SELECT t.C1, (-1 * t.score) AS score
FROM (
SELECT C1, C2 <#> '[1,2,3 ... 512]' AS score
FROM FACE_TABLE
ORDER BY score LIMIT topk
) t
WHERE score > 10;
-- Cosine: keep results with similarity > 0.5
SELECT t.C1, (1.0 - t.score) AS score
FROM (
SELECT C1, C2 <=> '[1,2,3 ... 512]' AS score
FROM FACE_TABLE
ORDER BY score LIMIT topk
) t
WHERE score > 0.5;Hybrid search
Combine vector search with scalar filters using the same syntax as standard AnalyticDB for PostgreSQL hybrid search. For details, see Use hybrid search.
Example: Find the 10 most similar faces captured between October 2023 and January 2024:
SELECT C1 FROM FACE_TABLE
WHERE C4 > '2023-10-01 00:00:00'
AND C4 < '2024-01-01 00:00:00'
ORDER BY C2 <-> '[1,2,3 ... 512]'
LIMIT 10;Migrate pgvector application code
This section shows how to adapt an existing pgvector application to work with AnalyticDB for PostgreSQL, using Dify as an example.
All changes are in api/core/rag/datasource/vdb/pgvector/pgvector.py. Three modifications are required:
1. Remove the pgvector extension install
FastANN is pre-installed. Comment out the CREATE EXTENSION call:
with self._get_cursor() as cur:
# Comment out — AnalyticDB for PostgreSQL does not use the pgvector extension
# cur.execute("CREATE EXTENSION IF NOT EXISTS vector")
cur.execute(SQL_CREATE_TABLE.format(table_name=self.table_name, dimension=dimension))2. Replace `USING heap` with `DISTRIBUTED BY`
AnalyticDB for PostgreSQL is built on PostgreSQL 9.4 and does not support the USING heap syntax. It also requires a distribution key.
# Before
SQL_CREATE_TABLE = """
CREATE TABLE IF NOT EXISTS {table_name} (
id uuid PRIMARY KEY,
text text NOT NULL,
meta jsonb NOT NULL,
embedding vector({dimension}) NOT NULL
) USING heap;
"""
# After
SQL_CREATE_TABLE = """
CREATE TABLE IF NOT EXISTS {table_name} (
id uuid PRIMARY KEY,
text text NOT NULL,
meta jsonb NOT NULL,
embedding vector({dimension}) NOT NULL
) DISTRIBUTED BY (id);
"""3. Add FastANN index creation
Add a SQL_CREATE_INDEX statement using USING ann syntax and call it after table creation:
SQL_CREATE_INDEX = """
CREATE INDEX ON {table_name} USING ann(embedding)
WITH (HNSW_M=16, HNSW_EF_CONSTRUCTION=500, PQ_ENABLE=1);
"""
def _create_collection(self, dimension: int):
cache_key = f"vector_indexing_{self._collection_name}"
lock_name = f"{cache_key}_lock"
with redis_client.lock(lock_name, timeout=20):
collection_exist_cache_key = f"vector_indexing_{self._collection_name}"
if redis_client.get(collection_exist_cache_key):
return
with self._get_cursor() as cur:
# Comment out — AnalyticDB for PostgreSQL does not use the pgvector extension
# cur.execute("CREATE EXTENSION IF NOT EXISTS vector")
cur.execute(SQL_CREATE_TABLE.format(table_name=self.table_name, dimension=dimension))
# Create a FastANN vector index
cur.execute(SQL_CREATE_INDEX.format(table_name=self.table_name))
redis_client.set(collection_exist_cache_key, 1, ex=3600)References
Supported client libraries
All pgvector client libraries work with AnalyticDB for PostgreSQL. The only code change required is the index creation SQL.
| Language | Client library |
|---|---|
| C | pgvector-c |
| C++ | pgvector-cpp |
| Go | pgvector-go |
| Java, Kotlin, Groovy, Scala | pgvector-java |
| PHP | pgvector-php |
| Python | pgvector-python |
| Rust | pgvector-rust |
Supported vector functions
| Category | Function | Return type | Description |
|---|---|---|---|
| Calculation | l2_distance | DOUBLE PRECISION | Euclidean distance between two vectors |
inner_product | DOUBLE PRECISION | Inner product distance; equivalent to cosine similarity after normalization | |
cosine_similarity | DOUBLE PRECISION | Cosine similarity (-1 to 1) | |
vector_dims | INTEGER | Number of dimensions | |
vector_norm | DOUBLE PRECISION | Vector norm | |
vector_add | VECTOR | Element-wise sum | |
vector_sub | VECTOR | Element-wise difference | |
vector_mul | VECTOR | Element-wise product | |
vector_angle | DOUBLE PRECISION | Angle between two vectors | |
| Sorting | vector_l2_squared_distance | DOUBLE PRECISION | Squared Euclidean distance (faster than l2_distance for sorting) |
vector_negative_inner_product | DOUBLE PRECISION | Negative inner product (used to sort by inner product in descending order) | |
cosine_distance | DOUBLE PRECISION | Cosine distance (0 to 2); used to sort by cosine similarity in descending order |
For distance formula details, see Create a vector index.
Examples:
SELECT l2_distance('[1,1,1,1]'::vector, '[2,2,2,2]'::vector);
SELECT inner_product('[1,1,1,1]'::vector, '[2,2,2,2]'::vector);
SELECT cosine_similarity('[1,1,1,1]'::vector, '[2,2,2,2]'::vector);
SELECT vector_dims('[1,1,1,1]'::vector);
SELECT vector_norm('[1,1,1,1]'::vector);
SELECT vector_add('[1,1,1,1]'::vector, '[2,2,2,2]'::vector);
SELECT vector_sub('[1,1,1,1]'::vector, '[2,2,2,2]'::vector);
SELECT vector_mul('[1,1,1,1]'::vector, '[2,2,2,2]'::vector);
SELECT vector_angle('[1,1,1,1]'::vector, '[2,2,2,2]'::vector);
SELECT vector_l2_squared_distance('[1,1,1,1]'::vector, '[2,2,2,2]'::vector);
SELECT vector_negative_inner_product('[1,1,1,1]'::vector, '[2,2,2,2]'::vector);
SELECT cosine_distance('[1,1,1,1]'::vector, '[2,2,2,2]'::vector);Supported vector operators
| Operator | Calculation | Sorting | Data type |
|---|---|---|---|
<-> | Squared Euclidean distance | Ascending | VECTOR |
<#> | Negative inner product | Descending by dot product | VECTOR |
<=> | Cosine distance | Ascending (lower = more similar) | VECTOR |
+ | Element-wise sum | — | VECTOR |
- | Element-wise difference | — | VECTOR |
* | Element-wise product | — | VECTOR |
Examples:
SELECT '[1,1,1,1]'::vector <-> '[2,2,2,2]'::vector AS score;
SELECT '[1,1,1,1]'::vector <#> '[2,2,2,2]'::vector AS score;
SELECT '[1,1,1,1]'::vector <=> '[2,2,2,2]'::vector AS score;
SELECT '[1,1,1,1]'::vector + '[2,2,2,2]'::vector AS value;
SELECT '[1,1,1,1]'::vector - '[2,2,2,2]'::vector AS value;
SELECT '[1,1,1,1]'::vector * '[2,2,2,2]'::vector AS value;What's next
Perform vector search — recall rate tuning parameters
Use hybrid search — advanced scalar + vector filtering
Create a vector index — full index parameter reference