AnalyticDB for MySQL lets you run similarity searches on unstructured data using feature vectors encoded by AI models. Store the vectors alongside your structured data, then query both together in standard SQL.
Common use cases include image search, facial recognition, voiceprint matching, and semantic text search.
Prerequisites
Before you begin, ensure that you have:
An AnalyticDB for MySQL cluster running version 3.1.4.0 or later
Use minor versions 3.1.5.16, 3.1.6.8, 3.1.8.6, or later for the best experience. On other minor versions, set the CSTORE_PROJECT_PUSH_DOWN and CSTORE_PPD_TOP_N_ENABLE parameters to false before using vector search. To check your cluster's minor version or request an upgrade, see How do I query the version of an AnalyticDB for MySQL cluster?
How it works
Use an AI model to extract features from unstructured data (images, audio, text) and encode them as fixed-length arrays (feature vectors).
Store the vectors in an AnalyticDB for MySQL table alongside your structured data.
At query time, pass a query vector to
l2_distance()in aSELECTstatement. AnalyticDB for MySQL uses the vector index to find the most similar entries and returns results in milliseconds.
AnalyticDB for MySQL uses a massively parallel processing (MPP) architecture to run vector queries, enabling high-concurrency writes and immediate query availability after data is written.
Performance benchmark (512-dimensional face vectors):
| Scenario | Dataset size | QPS | Response time | Recall |
|---|---|---|---|---|
| High throughput | 10 billion entries | 100 | 50 ms | 99% |
| High concurrency | 200 million entries | 1,000 | 1 s | 99% |
Key concepts
| Term | Description |
|---|---|
| Feature vector | A fixed-length array of numbers that represents an entity in vector space. The distance between two vectors measures their similarity. |
| Vector search | A method for fast search and matching in a feature vector dataset. |
| Vector index | An approximate nearest neighbor (ANN) index built on a vector column, using the HNSW_PQ algorithm. |
| KNN | k-nearest neighbor — finds the k entries closest to a query vector. |
| RNN | radius nearest neighbor — finds all entries within a specified distance of a query vector. |
l2_distance | The Euclidean distance function used for vector queries. Calculated as: *(x₁ − y₁)² + (x₂ − y₂)² + … + (xₙ − yₙ)²* |
Limitations
| Constraint | Detail |
|---|---|
| Minimum cluster version | V3.1.4.0 |
| Supported algorithm | HNSW_PQ only |
| Supported distance measure | SquaredL2 only |
| Supported column data types | ARRAY<FLOAT>, ARRAY<BYTE>, ARRAY<SMALLINT> |
| Vector format | Fixed-length arrays only |
典型场景
AnalyticDB MySQL的向量分析功能主要应用于以下场景:
图片检索:通过图片检索图片。
声纹匹配:通过音频检索音频。
文本检索:通过文本检索近似文本。
商品图片检索:在大量图片中检索包含同一商品的图片。
向量分析模式
AnalyticDB for MySQL向量分析支持两种模式:无损模式和微损模式。不同场景所选的向量分析模式不同。无损模式下,保证100%的数据召回率。微损模式下,AnalyticDB for MySQL会为向量构建索引,索引会引起召回率的轻微下降,保证99%的数据召回率。您可以参考下表,根据业务场景选择适合的向量分析模式。
场景 | 数据量 | QPS | 向量分析模式 | 数据召回率 |
| 百万级别 | 100 QPS | 无损模式 | 100% |
| 亿级别 | 1000 QPS | 微损模式 | 99% |
历史库 | 百万级别 | 100 QPS | 微损模式 | 99% |
Create a vector index
Syntax
To create a vector index when creating a table:
ANN INDEX [index_name] (column_name) [algorithm=HNSW_PQ] [distancemeasure=SquaredL2]To add a vector index to an existing table:
ALTER TABLE table_name ADD ANN INDEX [index_name] (column_name) [algorithm=HNSW_PQ] [distancemeasure=SquaredL2]Parameters
| Parameter | Description |
|---|---|
ANN INDEX | Keyword that identifies this as a vector index. |
index_name | Name of the index. For naming conventions, see the Naming limits section. |
column_name | Name of the vector column. Supported types: ARRAY<FLOAT>, ARRAY<BYTE>, ARRAY<SMALLINT>. For naming conventions, see the Naming limits section. |
algorithm | Index algorithm. Set to HNSW_PQ. |
distancemeasure | Distance formula. Set to SquaredL2 (squared Euclidean distance). |
Example: create a table with vector indexes
The following example creates a table with two vector columns — float_feature (ARRAY<FLOAT>, 4-dimensional) and short_feature (ARRAY<SMALLINT>, 4-dimensional) — and builds an ANN INDEX on each:
CREATE TABLE vector (
xid bigint not null,
cid bigint not null,
uid varchar not null,
vid varchar not null,
wid varchar not null,
float_feature array<float>(4),
short_feature array<smallint>(4),
ANN INDEX idx_short_feature(short_feature),
ANN INDEX idx_float_feature(float_feature),
PRIMARY KEY (xid, cid, vid)
) DISTRIBUTED BY HASH(xid);Example: add vector indexes to an existing table
If the table already exists without vector indexes, add them using ALTER TABLE:
-- Create the table first (without indexes)
CREATE TABLE vector (
xid BIGINT not null,
cid BIGINT not null,
uid VARCHAR not null,
vid VARCHAR not null,
wid VARCHAR not null,
float_feature array<FLOAT>(4),
short_feature array<SMALLINT>(4),
PRIMARY KEY (xid, cid, vid)
) DISTRIBUTED BY HASH(xid);
-- Add vector indexes
ALTER TABLE vector ADD ANN INDEX idx_float_feature(float_feature);
ALTER TABLE vector ADD ANN INDEX idx_short_feature(short_feature);Query vector data
Use l2_distance(column, query_vector) in a SELECT statement to find similar entries. Order by the distance value and use LIMIT to return the top k results.
All examples below use the l2_distance function to query the short_feature column.
Insert sample data
INSERT INTO vector (xid, cid, uid, vid, wid, short_feature, float_feature) VALUES (1, 2, 'A', 'B', 'C', '[1,1,1,1]', '[1.2,1.5,2,3.0]');
INSERT INTO vector (xid, cid, uid, vid, wid, short_feature, float_feature) VALUES (2, 1, 'e', 'v', 'f', '[2,2,2,2]', '[1.5,1.15,2.2,2.7]');
INSERT INTO vector (xid, cid, uid, vid, wid, short_feature, float_feature) VALUES (0, 6, 'd', 'f', 'g', '[3,3,3,3]', '[0.2,1.6,5,3.7]');
INSERT INTO vector (xid, cid, uid, vid, wid, short_feature, float_feature) VALUES (5, 4, 'j', 'b', 'h', '[4,4,4,4]', '[1.0,4.15,6,2.9]');
INSERT INTO vector (xid, cid, uid, vid, wid, short_feature, float_feature) VALUES (8, 5, 'Sj', 'Hb', 'Dh', '[5,5,5,5]', '[1.3,4.5,6.9,5.2]');
INSERT INTO vector (xid, cid, uid, vid, wid, short_feature, float_feature) VALUES (5, 4, 'x', 'g', 'h', '[3,4,4,4]', '[1.0,4.15,6,2.9]');
INSERT INTO vector (xid, cid, uid, vid, wid, short_feature, float_feature) VALUES (5, 4, 'j', 'r', 'k', '[6,6,4,4]', '[1.0,4.15,6,2.9]');
INSERT INTO vector (xid, cid, uid, vid, wid, short_feature, float_feature) VALUES (5, 4, 's', 'i', 'q', '[2,2,4,4]', '[1.0,4.15,6,2.9]');KNN query: top k nearest neighbors
Return the 3 entries closest to [1,1,1,1]:
SELECT xid, l2_distance(short_feature, '[1,1,1,1]') AS dis
FROM vector
ORDER BY dis
LIMIT 3;Result:
+-------+------+
| xid | dis |
+-------+------+
| 1 | 0.0 |
+-------+------+
| 2 | 4.0 |
+-------+------+
| 0 | 16.0 |
+-------+------+KNN query with a filter
Return the 4 entries closest to [1,1,1,1] where xid = 5 and cid = 4:
SELECT uid, l2_distance(short_feature, '[1,1,1,1]') AS dis
FROM vector
WHERE xid = 5 AND cid = 4
ORDER BY dis
LIMIT 4;Result:
+-------+------+
| uid | dis |
+-------+------+
| s | 20.0 |
+-------+------+
| x | 31.0 |
+-------+------+
| j | 36.0 |
+-------+------+
| j | 68.0 |
+-------+------+RNN query: entries within a distance threshold
Return entries closest to [1,1,1,1] where the distance is less than 50 and xid = 5, limited to 3 results:
SELECT uid, l2_distance(short_feature, '[1,1,1,1]') AS dis
FROM vector
WHERE l2_distance(short_feature, '[1,1,1,1]') < 50.0 AND xid = 5
ORDER BY dis
LIMIT 3;Result:
+-------+------+
| uid | dis |
+-------+------+
| s | 20.0 |
+-------+------+
| x | 31.0 |
+-------+------+
| j | 36.0 |
+-------+------+删除向量索引
语法
仅3.2.1.0及以上版本支持删除向量索引。若表中存在向量索引,您可以使用以下语句删除。定义为:
ALTER TABLE table_name DROP [ANN] INDEX index_name;示例
删除vector表中的向量索引。
ALTER TABLE vector DROP ann INDEX idx_short_feature;
ALTER TABLE vector DROP ann INDEX idx_float_feature;