Vector search captures semantic meaning but misses exact keyword matches. Full-text search handles precise keyword matching but misses semantic relationships such as synonyms and paraphrases. Two-way retrieval combines both: AnalyticDB for PostgreSQL runs vector search and full-text search in parallel, then merges and reranks the results to maximize recall rate.
For V6.0 instances, see Two-way retrieval based on vector search and full-text search.
How it works
Vector search — Uses approximate nearest neighbor search (ANNS) on dense vector representations to retrieve the top-K semantically similar items.
Full-text search — Uses pgsearch-based BM25 indexes to match items based on term frequency and inverse document frequency, supplementing results with strong keyword relevance. In V6.0, full-text search relies on GIN indexes. In V7.0, the pgsearch-based BM25 indexes improve retrieval efficiency and relevance.
Reranking and post-processing — Merges the two result sets and reranks them into a single list using a hybrid score.
Supported versions
AnalyticDB for PostgreSQL V7.0 instances of V7.2.1.0 or later.
You can view the minor version on the Basic Information page of an instance in the AnalyticDB for PostgreSQL console. If your instance does not meet the required versions, update the minor version of the instance.
Prerequisites
Before you begin, ensure that you have:
The vector search engine optimization feature enabled on your AnalyticDB for PostgreSQL instance
The pgsearch extension installed (a
pgsearchschema appears in your schema list if it is installed)
If the pgsearch extension is not installed, submit a ticket for installation assistance. After installation, restart your instance.
Set up two-way retrieval
Step 1: Create a sample table
Create a table named documents and insert sample data.
-- Create the table. The vector column stores vector data.
CREATE TABLE IF NOT EXISTS documents(
id TEXT,
docname TEXT,
title TEXT,
vector real[],
text TEXT);
-- Set the storage mode of the vector column to PLAIN.
ALTER TABLE documents ALTER COLUMN vector SET STORAGE PLAIN;
-- Insert sample data.
INSERT INTO documents (id, docname, title, vector, text) VALUES
('1', 'doc_1', 'Exploring the Universe',
'{0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}',
'The universe is vast, filled with mysteries and astronomical wonders waiting to be discovered.'),
('2', 'doc_2', 'The Art of Cooking',
'{0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1}',
'Cooking combines ingredients artfully, creating flavors that nourish and bring people together.'),
('3', 'doc_3', 'Technology and Society',
'{0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2}',
'Technology transforms society, reshaping communication, work, and our daily interactions significantly.'),
('4', 'doc_4', 'Psychology of Happiness',
'{0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3}',
'Happiness is complex, influenced by relationships, gratitude, and the pursuit of meaningful experiences.'),
('5', 'doc_5', 'Sustainable Living Practices',
'{0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4}',
'Sustainable living involves eco-friendly choices, reducing waste, and promoting environmental awareness.');Step 2: Create indexes
Create a vector index on the vector column and a BM25 full-text index on the text column.
-- Vector index
CREATE INDEX documents_idx ON documents USING ann(vector) WITH (dim = 10, algorithm = hnswflat, distancemeasure = L2, vector_include = 0);
-- BM25 full-text index
CALL pgsearch.create_bm25(
index_name => 'documents_bm25_idx',
table_name => 'documents',
text_fields => '{text: {}}'
);Step 3: Run two-way retrieval
Two-way retrieval queries both indexes in parallel and combines the results using a hybrid score. Two scoring methods are available.
Choose a scoring method
| Method | How it works | When to use |
|---|---|---|
| Weighted score | Combines raw BM25 and vector similarity scores using configurable weights | When you want fine-grained control over how much each search type contributes to the final score |
| Reciprocal rank fusion (RRF) | Combines result rankings (not raw scores) using the formula 1 / (k + rank) | When raw scores from the two search types are on different scales and hard to normalize |
Option 1: Weighted score
The query uses two common table expressions (CTEs): t1 runs full-text search via the @@@ operator, and t2 runs vector search via the <-> operator. A FULL OUTER JOIN merges the results, and the final hybrid score is a weighted sum of the BM25 and vector similarity scores.
WITH t1 AS (
SELECT
id,
docname,
title,
text,
text @@@ pgsearch.config('text:astronomical') AS score,
2 AS source
FROM
documents
ORDER BY score
LIMIT 10
),
t2 AS (
SELECT
id,
docname,
title,
text,
cosine_similarity(vector, ARRAY[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1]::real[]) AS score,
1 AS source
FROM
documents
ORDER BY vector <-> ARRAY[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1]
LIMIT 10
)
SELECT t2.*, COALESCE(ABS(t1.score), 0.0) * 0.2 + COALESCE(t2.score, 0.0) * 0.8 AS hybrid_score
-- Weight distribution is for demonstration only. Adjust weights based on your use case.
FROM t1
FULL OUTER JOIN t2 ON t1.id = t2.id
ORDER BY hybrid_score DESC;The default weights in this example are 0.2 for BM25 and 0.8 for vector similarity. Adjust the weights based on your retrieval requirements.
Option 2: Reciprocal rank fusion (RRF)
RRF scores each result based on its rank in each search, then sums the rank-based scores. Items that rank highly in both searches receive a higher combined score. The smoothing parameter k (default: 60) controls how much high rankings are rewarded — a larger k reduces score differences between ranks.
Use RRF when the raw scores from vector search and full-text search are on different scales and direct normalization is impractical.
WITH bm25 AS (
SELECT
id,
docname,
title,
text,
text @@@ pgsearch.config('text:astronomical') AS score,
2 AS source,
ROW_NUMBER() OVER () AS rank_bm25
FROM
documents
ORDER BY score
LIMIT 10
), hnsw AS (
SELECT
id,
docname,
title,
text,
vector <-> ARRAY[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1] AS score,
1 AS source,
ROW_NUMBER() OVER (ORDER BY vector <-> ARRAY[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1]) AS rank_hnsw
FROM
documents
ORDER BY vector <-> ARRAY[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1]
LIMIT 10
)
SELECT
COALESCE(bm25.id, hnsw.id) AS id,
COALESCE(bm25.docname, hnsw.docname) AS docname,
COALESCE(bm25.title, hnsw.title) AS title,
COALESCE(bm25.text, hnsw.text) AS text,
CASE
WHEN bm25.rank_bm25 > 0 AND hnsw.rank_hnsw > 0 THEN
COALESCE(1.0 / (60 + bm25.rank_bm25), 0) + COALESCE(1.0 / (60 + hnsw.rank_hnsw), 0)
WHEN bm25.rank_bm25 > 0 THEN
COALESCE(1.0 / (60 + bm25.rank_bm25), 0)
WHEN hnsw.rank_hnsw > 0 THEN
COALESCE(1.0 / (60 + hnsw.rank_hnsw), 0)
ELSE 0
END AS hybrid_score
FROM
bm25
FULL OUTER JOIN hnsw ON bm25.id = hnsw.id
ORDER BY hybrid_score DESC;Step 4 (Optional): Encapsulate the query as a function
Wrap the weighted-score query in a reusable function to simplify repeated calls.
Create the function.
CREATE OR REPLACE FUNCTION search_documents( table_name TEXT, vector_column TEXT, text_column TEXT, search_keyword TEXT, search_vector REAL[], limit_size INT, hnsw_weight FLOAT8 DEFAULT 0.8 -- Default weight for vector search ) RETURNS TABLE ( id TEXT, docname TEXT, title TEXT, text TEXT, hybrid_score FLOAT8 ) AS $$ DECLARE query_string TEXT; bm25_weight FLOAT8; BEGIN bm25_weight := 1.0 - hnsw_weight; query_string := 'WITH t1 AS ( SELECT id, docname, title, ' || text_column || ', ' || text_column || ' @@@ pgsearch.config(''' || search_keyword || ''') AS score, 2 AS source FROM ' || table_name || ' ORDER BY score LIMIT ' || limit_size || ' ), t2 AS ( SELECT id, docname, title, ' || text_column || ', cosine_similarity(' || vector_column || ', $1) AS score, 1 AS source FROM ' || table_name || ' ORDER BY ' || vector_column || ' <-> $1 LIMIT ' || limit_size || ' ) SELECT t2.id, t2.docname, t2.title, t2.' || text_column || ', COALESCE(ABS(t1.score), 0.0) * ' || bm25_weight || ' + COALESCE(t2.score, 0.0) * ' || hnsw_weight || ' AS hybrid_score FROM t1 FULL OUTER JOIN t2 ON t1.id = t2.id ORDER BY hybrid_score DESC;'; RETURN QUERY EXECUTE query_string USING search_vector; END; $$ LANGUAGE plpgsql;Call the function.
SELECT * FROM search_documents( 'documents', -- table name 'vector', -- vector column 'text', -- text column 'astronomical', -- search keyword ARRAY[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1], -- search vector 10, -- result limit 0.8 -- vector search weight (BM25 weight = 0.2) );