PolarDB for MySQL supports vector similarity search through SQL, using a built-in vector data type, distance functions, and vector indexes. The vector computation runs inside the database kernel, supports ACID transactions, and uses the In-Memory Column Index (IMCI) with HNSW-based algorithms to deliver both exact k-nearest neighbor (KNN) search with 100% recall and high-performance approximate nearest neighbor (ANN) search.
Common use cases include recommendation systems, chatbots, and image retrieval.
Prerequisites
Before you begin, ensure that you have:
A PolarDB for MySQL cluster with kernel version 8.0.2, revision 8.0.2.2.30 or later
An IMCI read-only node added to the cluster (required for vector indexes)
Version history
Features vary by kernel version. Upgrade your cluster to the latest version to access all features.
| Feature | Minimum version | Description |
|---|---|---|
| Basic vector features and the HNSW index algorithm | 8.0.2.2.30 | Core vector retrieval capabilities |
FAISS_HNSW_FLAT and FAISS_HNSW_PQ index algorithms | 8.0.2.2.31 | Additional index algorithms based on the FAISS library |
| Modify or delete vector indexes | 8.0.2.2.32 | Dynamically manage vector indexes using ALTER TABLE |
Quick start
The following steps walk through creating a table with a vector index, inserting data, and running an ANN search.
Step 1: Set up the table.
-- Enable columnstore for the entire table and define a vector index on the v1 column
CREATE TABLE t1 (
id INT PRIMARY KEY,
v1 VECTOR(4) COMMENT 'imci_vector_index=HNSW(metric=COSINE, max_degree=16)'
) COMMENT 'COLUMNAR=1';Step 2: Insert vectors.
INSERT INTO t1 (id, v1) VALUES
(1, STRING_TO_VECTOR('[1.0, 2.0, 3.0, 4.0]')),
(2, STRING_TO_VECTOR('[2.0, 2.2, 2.4, 2.6]')),
(3, STRING_TO_VECTOR('[8.8, 8.88, 8.888, 8.8888]'));Step 3: Enable ANN search and run a query.
-- Enable vector index acceleration (session-level)
SET imci_enable_vector_search = ON;
SET cost_threshold_for_imci = 0;
-- Find the 2 most similar vectors to the query
SELECT id, VECTOR_TO_STRING(v1)
FROM t1
ORDER BY DISTANCE(v1, STRING_TO_VECTOR('[1.2, 2.3, 3.4, 4.5]'), 'COSINE') ASC
LIMIT 2;Step 4: Confirm index usage.
EXPLAIN SELECT id, VECTOR_TO_STRING(v1)
FROM t1
ORDER BY DISTANCE(v1, STRING_TO_VECTOR('[1.2, 2.3, 3.4, 4.5]'), 'COSINE') ASC
LIMIT 2;If the execution plan contains Vector Search, the vector index is active.
+----+---------------------------+------+--------+--------+---------------------------------------------------------------------------------------+
| ID | Operator | Name | E-Rows | E-Cost | Extra Info |
+----+---------------------------+------+--------+--------+---------------------------------------------------------------------------------------+
| 1 | Select Statement | | | | IMCI Execution Plan (max_dop = 2, max_query_mem = 429496729) |
| 2 | └─Compute Scalar | | 2 | 0.00 | |
| 3 | └─Limit | | 2 | 0.00 | Offset=0 Limit=2 |
| 4 | └─Sort | | 2 | 0.00 | Sort Key: VECTOR_DISTANCE(t1.v1,"[1.200000,2.300000,3.400000,4.500000]","COSINE") ASC |
| 5 | └─Vector Search | t1 | 2 | 0.00 | |
+----+---------------------------+------+--------+--------+---------------------------------------------------------------------------------------+Vector type
Define a vector column
Use VECTOR(N) in a CREATE TABLE or ALTER TABLE statement to define a vector column, where N is the number of dimensions (1–16,383). Each dimension is a single-precision floating-point number (4 bytes).
-- Define a 4-dimensional vector column
CREATE TABLE t1 (id INT PRIMARY KEY, v1 VECTOR(4));Constraints:
A vector column supports equality comparison only with another vector column. It cannot be compared with other data types.
A vector column cannot be a primary key, foreign key, unique key, or partition key.
If
Nexceeds 16,383, the following error is returned:Data size (xxx Bytes, xxx dimensions) exceeds VECTOR max (65532 Bytes, 16383 dimensions) for column: 'xxx'
Convert between string and binary formats
PolarDB provides two functions for converting vector data between text and binary formats.
`STRING_TO_VECTOR` — converts a string to the internal binary format.
Input format: comma-separated floating-point numbers enclosed in square brackets, for example
'[1.2, 3.4, 5.6]'.The output uses little-endian byte order. For example,
1.0(hex0x3F800000) is stored as0000803F.If the input format is invalid, the error
Data cannot be converted to a valid vector: 'xxx'is returned.
SELECT HEX(STRING_TO_VECTOR('[1,2,3,4]'));
-- Output: 0000803F000000400000404000008040`VECTOR_TO_STRING` — converts binary vector data back to a readable string.
Input: binary data where every 4 bytes represent one dimension in little-endian byte order.
If the input is invalid, the error
Data cannot be converted to a valid vector: ''is returned.
SELECT VECTOR_TO_STRING(0x0000803F000000400000404000008040);
-- Output: [1.00000e+00,2.00000e+00,3.00000e+00,4.00000e+00]Work with vector data
Add a vector column to an existing table:
ALTER TABLE t1 ADD COLUMN v1 VECTOR(4);Insert a vector:
-- As a string using STRING_TO_VECTOR
INSERT INTO t1 (id, v1) VALUES (1, STRING_TO_VECTOR('[1.0, 2.0, 3.0, 4.0]'));
-- In binary format directly
INSERT INTO t1 VALUES (2, 0x0000803F000000400000404000008040);Upsert a vector:
INSERT INTO t1 VALUES (1, STRING_TO_VECTOR('[1.0, 2.0, 3.0, 4.0]'))
ON DUPLICATE KEY UPDATE v1 = STRING_TO_VECTOR('[1.0, 2.0, 3.0, 4.0]');Update a vector:
UPDATE t1 SET v1 = STRING_TO_VECTOR('[1.0, 2.0, 3.0, 4.0]') WHERE id = 1;Delete a row by vector value:
DELETE FROM t1 WHERE v1 = STRING_TO_VECTOR('[1.0, 2.0, 3.0, 4.0]');View vector values as strings:
SELECT VECTOR_TO_STRING(v1) FROM t1;Vector indexes
Vector indexes are defined in a vector column's COMMENT. Only IMCI read-only nodes build and serve vector indexes.
Create a vector index
Define a vector index by setting the column COMMENT in a CREATE TABLE or ALTER TABLE statement.
Syntax:
COMMENT 'imci_vector_index=<algorithm>(<parameters>) [other_comments]'imci_vector_index=is a fixed prefix that declares the vector index.Always include parentheses after the algorithm name, even when using all defaults:
HNSW().Separate multiple parameters with commas.
Each parameter uses the format
name=valueorname:value.
Index algorithms:
| Algorithm | Implementation | Characteristics |
|---|---|---|
HNSW | VSAG | High precision; higher memory usage |
FAISS_HNSW_FLAT | FAISS | High precision; higher memory usage |
FAISS_HNSW_PQ | FAISS | Uses product quantization (PQ) to compress vectors — lower memory usage, some precision loss |
Index build parameters:
All three algorithms share the following build-time parameters:
| Parameter | Alias | Description | Range | Default |
|---|---|---|---|---|
metric | distance | Vector similarity measure. Only uppercase values are accepted. | COSINE, INNER_PRODUCT, EUCLIDEAN | COSINE |
max_degree | M | Maximum connections per node in the graph. A larger value creates a denser graph — slower to build but may improve precision. | Positive integer; recommended 16–64 | 16 |
ef_construction | — | Candidate neighbors to evaluate during index building. A larger value improves index quality but increases build time. | Positive integer; recommended 100–500 | 200 |
FAISS_HNSW_PQ has two additional parameters for controlling product quantization:
| Parameter | Description | Range | Default |
|---|---|---|---|
pq_m | Number of PQ subspaces. Must be a divisor of the vector dimension. A larger value improves precision but increases memory usage. | Positive integer that divides the vector dimension | 1 |
pq_nbits | Bits used for the quantized representation of each subspace. Typically set to 8, which gives 256 centroids per subspace. | Positive integer ≤ 24 | 8 |
Example 1: Define a vector index when creating a table.
-- Method 1: Enable columnstore for the entire table
CREATE TABLE t1 (
id INT PRIMARY KEY,
v1 VECTOR(4) COMMENT 'imci_vector_index=HNSW(metric=COSINE, max_degree=16)'
) COMMENT 'COLUMNAR=1';
-- Method 2: Enable columnstore only for the vector column
CREATE TABLE t1 (
id INT PRIMARY KEY,
description TEXT,
v1 VECTOR(4) COMMENT 'COLUMNAR=1, imci_vector_index=HNSW(metric=COSINE)'
);Example 2: Add a vector index to an existing table.
-- Method 1: Enable table-level columnstore and define the vector index in the same statement
ALTER TABLE t1
COMMENT "COLUMNAR=1",
MODIFY COLUMN v1 VECTOR(4) COMMENT "imci_vector_index=HNSW(metric=COSINE,max_degree=16,ef_construction=300)";
-- Method 2: Enable columnstore only for the vector column
-- The vector index is created automatically based on the column COMMENT
ALTER TABLE t1 MODIFY COLUMN v1 VECTOR(4)
COMMENT "COLUMNAR=1 imci_vector_index=HNSW(metric=COSINE,max_degree=16,ef_construction=300)";Modify a vector index
Requires version 8.0.2.2.32 or later.
Change the index algorithm or parameters by modifying the column COMMENT.
-- Change the algorithm from HNSW to FAISS_HNSW_FLAT
ALTER TABLE t1 MODIFY COLUMN v1 VECTOR(4)
COMMENT "imci_vector_index=FAISS_HNSW_FLAT(metric=COSINE,max_degree=16,ef_construction=300)";
-- Change max_degree from 16 to 32
ALTER TABLE t1 MODIFY COLUMN v1 VECTOR(4)
COMMENT "imci_vector_index=HNSW(metric=COSINE,max_degree=32,ef_construction=300)";Delete a vector index
Delete the vector index along with the columnstore index:
A vector index cannot exist independently of a columnstore index. Disabling columnstore deletes the vector index by cascade.
ALTER TABLE t1 COMMENT 'COLUMNAR=0';Delete only the vector index (requires version 8.0.2.2.32 or later):
Clear the column COMMENT to remove the vector index while keeping the columnstore index.
ALTER TABLE t1 MODIFY COLUMN v1 VECTOR(4) COMMENT "";Vector index parameters
| Parameter | Description |
|---|---|
imci_enable_inline_vector_index | Controls whether IMCI creates vector indexes during initialization. ON (default): the IMCI read-only node builds a vector index based on the vector column's COMMENT. OFF: no new vector indexes are built. Existing vector indexes continue to be built, but are not loaded or built after a restart. |
imci_vector_index_dump_rows_threshold | Controls when the background task writes incremental data to the vector index. When the number of new rows since the last snapshot exceeds this threshold, the background task flushes the data. Valid values: 1–4,294,967,295. Default: 1,000. |
Vector queries
Distance calculation
Use the DISTANCE function to calculate the similarity between two vectors.
DISTANCE(vector1, vector2, '<metric>')| Metric | Description |
|---|---|
COSINE | Cosine distance. Measures directional similarity between two vectors. A smaller value means greater similarity. |
EUCLIDEAN | Euclidean distance. Measures the straight-line distance between two points. A smaller value means greater similarity. |
DOT | Dot product. Multiplies corresponding components and sums the results. A smaller value means greater similarity. |
SELECT DISTANCE(v1, STRING_TO_VECTOR('[1.2,2.3,3.4,4.5]'), 'COSINE') FROM t1;ANN search
ANN search uses a vector index to retrieve approximate nearest neighbors efficiently. All of the following conditions must be met for the optimizer to use the vector index:
The query includes both
ORDER BYandLIMITclauses.The first expression in
ORDER BYisDISTANCE(...).The sort direction is
ASC.The vector column in
DISTANCE(...)has an active vector index.The
metricinDISTANCE(...)matches themetricused to build the index.If the query includes a
JOIN, the execution plan must use a Right Deep Join Tree, and theDISTANCE(...)field must come from the driving table.The query cannot contain
GROUP BY. To aggregate results, wrap the vector search in a subquery.
Enable ANN search:
Run these two session-level commands before executing ANN queries.
-- Allow the optimizer to use vector indexes
SET imci_enable_vector_search = ON;
-- Force queries to use the column store execution path
SET cost_threshold_for_imci = 0;SET cost_threshold_for_imci = 0 forces all queries in the session to use the columnstore index. This can degrade point queries that run efficiently on the row store. Use this setting only in sessions dedicated to vector retrieval — do not set it globally.
Run an ANN query:
-- Find the top 5 most similar records
SELECT id, v1
FROM t1
WHERE category_id = 123
ORDER BY DISTANCE(v1, STRING_TO_VECTOR('[...]'), 'COSINE') ASC
LIMIT 5;Using `GROUP BY` with vector search:
GROUP BY cannot appear in the same query as ORDER BY DISTANCE(...). Run the vector search in a subquery first, then aggregate.
-- Incorrect: GROUP BY conflicts with ORDER BY DISTANCE()
-- SELECT category_id, COUNT(*) FROM t1
-- GROUP BY category_id
-- ORDER BY DISTANCE(v1, STRING_TO_VECTOR('[...]'), 'COSINE') ASC
-- LIMIT 5;
-- Correct: perform ANN search in a subquery, then aggregate
SELECT category_id, COUNT(*)
FROM (
SELECT id, category_id
FROM t1
ORDER BY DISTANCE(v1, STRING_TO_VECTOR('[...]'), 'COSINE') ASC
LIMIT 100
) AS nearest_neighbors
GROUP BY category_id;ANN search parameters:
| Parameter | Description |
|---|---|
imci_enable_vector_search | Controls whether vector indexes accelerate ANN search. ON (default): enabled. OFF: disabled. |
imci_hnswpq_k_factor | Recall expansion factor for FAISS_HNSW_PQ. When retrieving K nearest neighbors, the executor fetches K × imci_hnswpq_k_factor candidates from the index, sorts them exactly using the original vectors, and returns the top K. Valid values: 1–UINT32_MAX. Default: 1. |
KNN (exact) search
KNN search traverses all data to calculate distances, guaranteeing 100% recall. It does not require a vector index and is suitable for small datasets.
-- Find the 10 most similar records
SELECT id, VECTOR_TO_STRING(v1)
FROM t1
ORDER BY DISTANCE(v1, STRING_TO_VECTOR('[1.2,2.3,3.4,4.5]'), 'COSINE')
LIMIT 10;Vector queries with predicates
Add WHERE conditions to filter results alongside vector similarity.
SELECT * FROM t1
WHERE id < 10
ORDER BY DISTANCE(v1, STRING_TO_VECTOR('[1.2, 2.3, 3.4, 4.5]'), 'COSINE')
LIMIT 2;The optimizer selects one of three execution strategies based on the estimated selectivity of the filter:
| Strategy | When it applies |
|---|---|
| Prefilter | Estimated matching rows < imci_vector_search_prefilter_rows (default: 10,000). Applies the predicate to the full table first, then sorts the matching rows. |
| Postfilter | Estimated match rate >= imci_vector_search_filter_pct (default: 20%). Uses the vector index to retrieve candidates first, then applies the predicate. Repeats if the filtered results fall below the LIMIT. |
| Inline filter | Neither prefilter nor postfilter applies. Builds a bitmap from rows matching the predicate, then uses the bitmap during vector index traversal to ensure results satisfy the condition. |
Adaptive execution: If the optimizer's row count estimate is inaccurate, the system can switch strategies at runtime. For example, when using the inline filter strategy, if the actual number of matching rows turns out to be less than imci_vector_search_prefilter_rows, the system switches to prefilter dynamically.
Partition optimization: For predicates based on tenant IDs or tags, use LIST DEFAULT HASH partitioning. The optimizer uses partition pruning to select the best strategy per partition — postfilter for large partitions, prefilter or inline filter for smaller ones.
Predicate parameters:
| Parameter | Description |
|---|---|
imci_vector_search_filter_pct | If the estimated selectivity is at or above this value, the system prioritizes vector index retrieval. Valid values: 0–100. Default: 20 (%). |
imci_enable_vector_search_inline_filter | Controls whether the inline filter strategy is available. OFF (default): disabled. ON: enabled. |
imci_vector_search_prefilter_rows | If the estimated number of matching rows is below this value, the system uses the prefilter strategy. Valid values: 0–UINT64_MAX. Default: 10,000. |
Monitor vector index status
Vector index building and columnstore snapshot advancement run as separate background tasks. With continuous writes, the vector index snapshot may lag behind the columnstore data. Check the following system views to confirm the index is ready before running vector searches.
Index snapshot offset
Query INFORMATION_SCHEMA.IMCI_INDEX_STATS to see how many vectors the columnstore index has loaded into its snapshot.
SELECT schema_name, table_name, vector_rows
FROM information_schema.imci_index_stats;A VECTOR_ROWS value of 0 means no vector index snapshot is available — either the table has no vector index, or a newly created index has not yet generated a valid snapshot.
Compare VECTOR_ROWS with ROW_ID in INFORMATION_SCHEMA.IMCI_INDEXES to estimate build latency. The vector index still works with some latency — any vectors not yet written to the index are handled automatically during a search.
Physical vector index status
Query INFORMATION_SCHEMA.IMCI_VECTOR_INDEX_STATS to inspect the physical state of each vector index.
SELECT schema_name, table_name, column_name, vectors, memory_usage, storage_usage
FROM information_schema.imci_vector_index_stats;| Column | Description |
|---|---|
SCHEMA_NAME | Database name |
TABLE_NAME | Table name |
COLUMN_NAME | Vector column name |
VECTORS | Number of vectors in the index. Excludes NULL values and rows deleted during build. |
MEMORY_USAGE | Memory used by the index |
STORAGE_USAGE | Size of the persisted index file on disk |
INDEX_TYPE | Index type. Currently only HNSW is supported. |
INDEX_PARAMETERS | Index build parameters in JSON format |
ANN search is unavailable for any column not returned by this view. AVECTORSvalue of 0 after a cluster restart does not mean the index is lost. The index is persisted to disk. The first query after a restart triggers the data to load into memory. For accurate snapshot information, useVECTOR_ROWSinINFORMATION_SCHEMA.IMCI_INDEX_STATS.
Example: verify an index end to end
Set
imci_vector_index_dump_rows_thresholdto1in the PolarDB console to allow the index to build with a small number of rows. For instructions, see Specify cluster and node parameters.Create a test table and insert data:
CREATE TABLE t1 ( id INT AUTO_INCREMENT PRIMARY KEY, v1 VECTOR(4) COMMENT "imci_vector_index=HNSW(metric=COSINE,max_degree=16,ef_construction=300)" ) COMMENT "COLUMNAR=1"; INSERT INTO t1 VALUES (1, STRING_TO_VECTOR('[1.1,1.2,1.3,1.4]')), (2, STRING_TO_VECTOR('[2,2.2,2.4,2.6]'));Check the index snapshot status:
SELECT schema_name, table_name, row_id FROM information_schema.imci_indexes; -- Expected: row_id = 2 SELECT schema_name, table_name, vector_rows FROM information_schema.imci_index_stats; -- Expected: vector_rows = 2 SELECT schema_name, table_name, column_name, vectors FROM information_schema.imci_vector_index_stats; -- Expected: vectors = 2Insert a third row:
INSERT INTO t1 VALUES (3, STRING_TO_VECTOR('[8.8,8.88,8.888,8.8888]'));Query
IMCI_VECTOR_INDEX_STATSagain —VECTORSshould now be 3, confirming the background task flushed the new row to the index.Restart the IMCI read-only node. After the restart,
VECTORSinIMCI_VECTOR_INDEX_STATSreturns 0, whileVECTOR_ROWSinIMCI_INDEX_STATSstill shows 3. This is expected — the index data is on disk but not yet loaded into memory.ImportantRestarting the IMCI read-only node causes a brief service interruption.
Run an approximate query to trigger the index load:
SET cost_threshold_for_imci = 0; EXPLAIN SELECT * FROM t1 ORDER BY DISTANCE(v1, STRING_TO_VECTOR('[1.2, 2.3, 3.4, 4.5]'), 'COSINE') LIMIT 1; -- The execution plan should contain Vector Search SELECT id, VECTOR_TO_STRING(v1) FROM t1 ORDER BY DISTANCE(v1, STRING_TO_VECTOR('[1.2, 2.3, 3.4, 4.5]'), 'COSINE') LIMIT 1;Query
IMCI_VECTOR_INDEX_STATSagain —VECTORSreturns to 3, confirming the index is loaded and fully operational.