AnalyticDB for PostgreSQL vector databases support hybrid search, which combines structured field filtering, semi-structured field filtering, and two-way retrieval based on vector search and full-text search. This topic applies to V6.0 and V7.0. For V7.0-specific guidance, see Two-way retrieval based on vector search and full-text search in V7.0.
When to use hybrid search
Choose the retrieval method based on your query characteristics:
| Method | Best for | Example scenario |
|---|---|---|
| Vector search only | Semantic similarity — intent matters more than exact keywords | Finding conceptually related documents |
| Full-text search only | Exact keyword matching — specific terms must appear | Searching for a specific error message or product code |
| Vector search + structured filtering | Semantic similarity within a filtered dataset | Finding similar products within a price range |
| Two-way retrieval (vector + full-text) | Queries where both meaning and exact keywords matter, or when embedding quality is inconsistent | RAG pipelines, e-commerce search, mixed-query workloads |
Use two-way retrieval when vector search alone does not achieve the recall rate you need — for example, when embedding models produce poor representations for certain query types, or when your corpus contains keyword-critical terms that must appear verbatim.
How hybrid search works
Approximate nearest neighbor search (ANNS) vector indexes handle only unstructured data. In production environments, you often need to search structured and unstructured data together — for example, finding whether a person appears in a specific location within a time window, where the time range and location are structured data and face images are unstructured.
Most hybrid search implementations store structured data in a database and unstructured data in a vector search system, then intersect results from both. To prevent vector search from returning too few rows after structured filtering, they over-fetch by an amplification factor before applying filters — but high-selectivity filters can still eliminate enough results to fall short of the required top-K count.
AnalyticDB for PostgreSQL takes a different approach: it integrates the FastANN vector search engine directly as an index extension and uses the ranked-biased overlap (RBO) cost estimation rules of the optimizer to generate execution plans that search structured and unstructured data together.
Execution plan categories
Based on whether indexed structured condition columns exist and how selective those conditions are, AnalyticDB for PostgreSQL selects one of three execution plans.
The following example uses a product table to demonstrate all three plans.
Create the table and indexes:
CREATE TABLE products (
id serial primary key,
name varchar(256),
price real,
inTime timestamp,
url varchar(256),
feature real[]
);
-- Set the storage mode of the vector column to PLAIN.
ALTER TABLE products ALTER COLUMN feature SET STORAGE PLAIN;
-- Create a B-tree index for structured columns.
CREATE INDEX ON products(price, intime);
-- Create a vector index for the vector column.
CREATE INDEX ON products USING ann(feature) WITH (dim=512);
-- Collect statistics to generate an optimal execution plan for hybrid search.
ANALYZE products;Query: find the top 100 items most similar to a given image, priced between USD 100 and USD 200, added in March 2019:
SELECT id, price FROM products WHERE
price > 100 AND price <=200
AND inTime > '2019-03-01 00:00:00' AND inTime <= '2019-03-31 00:00:00'
ORDER BY
feature <-> array[10,2.0,..., 512.0]
LIMIT 100;Category 1: brute-force query
The optimizer fetches all rows matching the structured conditions, sorts them by vector distance, and returns the closest 100. This guarantees 100% recall but is slow when the structured conditions match a large number of rows.
QUERY PLAN
-----------------------------------------------------------------------------------
LIMIT
-> Gather Motion 3:1 (slice1; segments: 3)
Merge Key: (l2_squared_distance($0, feature))
-> LIMIT
-> Sort
Sort Key: (l2_squared_distance($0, feature))
-> Index Scan using products_price_idx on products
Index Cond: the filter condition for the price column.
Filter: the filter condition for the time column.
Optimizer: Postgres query optimizerCategory 2: pure vector search and structured filtering
The optimizer queries the vector index for the N most similar rows, then filters by the structured conditions. This is fast, but when structured conditions are highly selective, the final result may contain fewer rows than requested.
QUERY PLAN
-----------------------------------------------------------------------------------
LIMIT
-> Gather Motion 3:1 (slice1; segments: 3)
Merge Key: ((feature <-> $0))
-> LIMIT
-> Ann Index Scan using products_feature_idx on products
ORDER BY: (feature <-> $0)
Filter: the filter condition for the price and time columns.
Optimizer: Postgres query optimizerCategory 3: hybrid search (vector + structured indexes)
This plan combines both approaches. When structured condition columns are indexed and their index type supports bitmaps, the optimizer generates a bitmap from the structured index and pushes it into the vector index scan. This accelerates filtering while preserving recall.
QUERY PLAN
-----------------------------------------------------------------------------------
LIMIT
-> Gather Motion 3:1 (slice1; segments: 3)
Merge Key: ((feature <-> $0))
-> LIMIT
-> Fusion Ann Scan
-> Bitmap Index Scan on products_price_idx
Index Cond: the filter condition for the price column.
-> Ann Index Scan with filter using products_feature_idx on products
ORDER BY: (feature <-> $0)
Filter: the filter condition for the time column.
Optimizer: Postgres query optimizerTwo execution plan nodes indicate that hybrid search is in use:
Ann Index Scan with filter: pushes filter conditions down to the vector index, combining filtering with the index scan.
Fusion Ann Scan: used when structured condition columns have a supported index. It generates a bitmap from the structured index (left subtree) and pushes the bitmap into the vector index scan (right subtree). The right subtree is either
Ann Index Scan(bitmap only) orAnn Index Scan with filter(bitmap plus additional filter conditions).
Hybrid search methods
AnalyticDB for PostgreSQL supports three hybrid search methods:
| Method | Index types used | Supported data types |
|---|---|---|
| Vector search + structured field filtering | B-tree index | BIGINT, BOOLEAN, BYTEA, CHAR, VARCHAR, INTEGER, FLOAT, DOUBLE, DATE, SMALLINT, TIMESTAMP, SERIAL |
| Vector search + semi-structured field filtering | GIN index | JSON, JSONB, ARRAY |
| Two-way retrieval: vector + full-text search | Vector index + GIN index | — |
For more information on supported structured data types, see Data types.
The following sections show how to set up semi-structured field filtering and two-way retrieval. Structured field filtering with B-tree indexes is demonstrated in the Execution plan categories section above.
Use semi-structured field filtering
This example uses a stock analysis database where each record contains an article chunk, its release time, the stock IDs it mentions, and an embedding vector.
Table schema
| Field | Data type | Description |
|---|---|---|
| id | serial | Serial number |
| chunk | varchar(1024) | A text chunk from a stock analysis article |
| release_time | timestamp | The article's release time |
| stock_id_list | char(10)[] | The IDs of stocks mentioned in the article |
| url | varchar(1024) | The URL of the source article |
| feature | real[] | The embedding vector of the text chunk |
Prerequisites
Before you begin, ensure that you have:
An AnalyticDB for PostgreSQL V6.0 or V7.0 instance
The permission to create tables and indexes
Set up semi-structured field filtering
Create the table.
CREATE TABLE stock_analysis_chunks ( id serial primary key, chunk varchar(1024), release_time timestamp, stock_id_list integer[], url varchar(1024), feature real[] ) distributed by (id);Set the storage mode of the vector column to PLAIN.
ALTER TABLE stock_analysis_chunks ALTER COLUMN feature SET STORAGE PLAIN;Create a vector index.
CREATE INDEX ON stock_analysis_chunks USING ann(feature) WITH (dim=1536, distancemeasure=cosine, hnsw_m=64, pq_enable=1, hnsw_ef_construction=128);Create indexes for the structured and semi-structured columns.
-- Create a B-tree index for the structured column. CREATE INDEX ON stock_analysis_chunks(release_time); -- Create a GIN index for the semi-structured column. CREATE INDEX ON stock_analysis_chunks USING gin(stock_id_list);Collect statistics.
ANALYZE stock_analysis_chunks;Run a hybrid search query. The following query returns stock analysis articles published between July 18 and August 18, 2023, that mention all four specified stocks, ranked by semantic similarity to a query vector:
SELECT id, url, cosine_similarity(feature, array[1.0, 2.0, ..., 1536.0]::real[]) as score FROM stock_analysis_chunks WHERE release_time >= '2023-07-18 00:00:00' AND release_time <= '2023-08-18 00:00:00' AND stock_id_list @> ARRAY['BABA', 'AAPL', 'MSFT', 'AMZN'] ORDER BY feature <=> array[1.0, 2.0, ..., 1536.0]::real[] LIMIT 10;
Use two-way retrieval (vector + full-text search)
Two-way retrieval runs vector search and full-text search in parallel, then merges the results. Use this method when vector search alone does not achieve sufficient recall — for example, when embedding models underperform on certain query types or when the corpus contains terms that require exact keyword matching.
Prerequisites
Before you begin, ensure that you have:
An AnalyticDB for PostgreSQL V6.0 or V7.0 instance
The permission to create tables, indexes, and functions
Set up two-way retrieval
Create the table. The
to_tsvectorcolumn stores precomputed full-text search tokens.Populate the
to_tsvectorcolumn when inserting rows using PostgreSQL'sto_tsvector()function, for example:to_tsvector('english', text). Keep this column in sync with thetextcolumn when updating rows.-- vector: the vector column. to_tsvector: the full-text tokenizer column. CREATE TABLE IF NOT EXISTS documents( id TEXT, docname TEXT, title TEXT, vector real[], text TEXT, to_tsvector TSVECTOR);Set the storage mode of the vector column to PLAIN.
ALTER TABLE documents ALTER COLUMN vector SET STORAGE PLAIN;Create indexes.
Create a vector index for the vector column.
CREATE INDEX ON documents USING ann(vector) WITH (dim=1024, distancemeasure=cosine, hnsw_m=64, pq_enable=1, hnsw_ef_construction=128);Create a GIN index for the full-text tokenizer column.
CREATE INDEX ON documents USING gin(to_tsvector);
Create a helper function to convert query text into a
tsquery.CREATE OR REPLACE FUNCTION public.to_tsquery_from_text(txt text, lang regconfig DEFAULT 'english'::regconfig) RETURNS tsquery LANGUAGE sql IMMUTABLE STRICT AS $function$ SELECT to_tsquery(lang, COALESCE(string_agg(split_part(word, ':', 1), ' | '), '')) FROM (SELECT UNNEST(STRING_TO_ARRAY(TO_TSVECTOR(lang, txt)::text, ' ')) AS TEXT) AS words_only;$function$Run a two-way retrieval query. The query uses a common table expression (CTE) to run both searches and merge results. The first branch retrieves the top 10 rows by vector similarity; the second retrieves the top 10 rows by full-text relevance.
UNION ALLcombines both sets, and the outerSELECTdeduplicates byid, taking the higher similarity score for any row found by both methods.Result columns
Column Description similarityThe higher score from either the vector or full-text search for this row sourceA bitmask indicating which method(s) found the row: 1= vector only,2= full-text only,3= bothWITH combined AS ( ( SELECT id, docname, text, cosine_similarity(densevec,ARRAY{embedding}::real[]) AS similarity, 1 AS source FROM documents ORDER BY vector <=> ARRAY[10,2.0,…, 1024.0] LIMIT 10 ) UNION ALL ( SELECT id, docname, text, ts_rank(to_tsvector, to_tsquery_from_text('{query.query}', 'zh_cn'), 32) AS similarity, 2 AS source FROM documents WHERE to_tsvector@@to_tsquery_from_text('{query.query}', 'zh_cn') ORDER BY similarity DESC LIMIT 10 ) ) SELECT id, docname, title, text, MAX(similarity) AS similarity, BIT_OR(source) AS source FROM combined GROUP BY id ORDER BY similarity DESC;The query returns at most 20 rows (10 from vector search + 10 from full-text search, deduplicated by
id).Tuning the balance between vector and full-text search
By default, the query gives equal weight to vector similarity scores and full-text relevance scores. To favor one method over the other, multiply the similarity value in either branch before the
UNION ALL. For example, to double the weight of full-text results, replacets_rank(...) AS similaritywithts_rank(...) * 2 AS similarityin the second branch.