All Products
Search
Document Center

AnalyticDB:Vector search

Last Updated:Mar 28, 2026

AnalyticDB for MySQL lets you run similarity searches on unstructured data using feature vectors encoded by AI models. Store the vectors alongside your structured data, then query both together in standard SQL.

Common use cases include image search, facial recognition, voiceprint matching, and semantic text search.

Prerequisites

Before you begin, ensure that you have:

  • An AnalyticDB for MySQL cluster running version 3.1.4.0 or later

Use minor versions 3.1.5.16, 3.1.6.8, 3.1.8.6, or later for the best experience. On other minor versions, set the CSTORE_PROJECT_PUSH_DOWN and CSTORE_PPD_TOP_N_ENABLE parameters to false before using vector search. To check your cluster's minor version or request an upgrade, see How do I query the version of an AnalyticDB for MySQL cluster?

How it works

  1. Use an AI model to extract features from unstructured data (images, audio, text) and encode them as fixed-length arrays (feature vectors).

  2. Store the vectors in an AnalyticDB for MySQL table alongside your structured data.

  3. At query time, pass a query vector to l2_distance() in a SELECT statement. AnalyticDB for MySQL uses the vector index to find the most similar entries and returns results in milliseconds.

AnalyticDB for MySQL uses a massively parallel processing (MPP) architecture to run vector queries, enabling high-concurrency writes and immediate query availability after data is written.

Performance benchmark (512-dimensional face vectors):

ScenarioDataset sizeQPSResponse timeRecall
High throughput10 billion entries10050 ms99%
High concurrency200 million entries1,0001 s99%

Key concepts

TermDescription
Feature vectorA fixed-length array of numbers that represents an entity in vector space. The distance between two vectors measures their similarity.
Vector searchA method for fast search and matching in a feature vector dataset.
Vector indexAn approximate nearest neighbor (ANN) index built on a vector column, using the HNSW_PQ algorithm.
KNNk-nearest neighbor — finds the k entries closest to a query vector.
RNNradius nearest neighbor — finds all entries within a specified distance of a query vector.
l2_distanceThe Euclidean distance function used for vector queries. Calculated as: *(x₁ − y₁)² + (x₂ − y₂)² + … + (xₙ − yₙ)²*

Limitations

ConstraintDetail
Minimum cluster versionV3.1.4.0
Supported algorithmHNSW_PQ only
Supported distance measureSquaredL2 only
Supported column data typesARRAY<FLOAT>, ARRAY<BYTE>, ARRAY<SMALLINT>
Vector formatFixed-length arrays only

典型场景

AnalyticDB MySQL的向量分析功能主要应用于以下场景:

  • 图片检索:通过图片检索图片。

  • 声纹匹配:通过音频检索音频。

  • 文本检索:通过文本检索近似文本。

  • 商品图片检索:在大量图片中检索包含同一商品的图片。

向量分析模式

AnalyticDB for MySQL向量分析支持两种模式:无损模式和微损模式。不同场景所选的向量分析模式不同。无损模式下,保证100%的数据召回率。微损模式下,AnalyticDB for MySQL会为向量构建索引,索引会引起召回率的轻微下降,保证99%的数据召回率。您可以参考下表,根据业务场景选择适合的向量分析模式。

场景

数据量

QPS

向量分析模式

数据召回率

  • 人脸门禁考勤

  • 违禁图片库

  • 黑名单库

百万级别

100 QPS

无损模式

100%

  • 实时库

  • 全国人口库

亿级别

1000 QPS

微损模式

99%

历史库

百万级别

100 QPS

微损模式

99%

Create a vector index

Syntax

To create a vector index when creating a table:

ANN INDEX [index_name] (column_name) [algorithm=HNSW_PQ] [distancemeasure=SquaredL2]

To add a vector index to an existing table:

ALTER TABLE table_name ADD ANN INDEX [index_name] (column_name) [algorithm=HNSW_PQ] [distancemeasure=SquaredL2]

Parameters

ParameterDescription
ANN INDEXKeyword that identifies this as a vector index.
index_nameName of the index. For naming conventions, see the Naming limits section.
column_nameName of the vector column. Supported types: ARRAY<FLOAT>, ARRAY<BYTE>, ARRAY<SMALLINT>. For naming conventions, see the Naming limits section.
algorithmIndex algorithm. Set to HNSW_PQ.
distancemeasureDistance formula. Set to SquaredL2 (squared Euclidean distance).

Example: create a table with vector indexes

The following example creates a table with two vector columns — float_feature (ARRAY<FLOAT>, 4-dimensional) and short_feature (ARRAY<SMALLINT>, 4-dimensional) — and builds an ANN INDEX on each:

CREATE TABLE vector (
  xid bigint not null,
  cid bigint not null,
  uid varchar not null,
  vid varchar not null,
  wid varchar not null,
  float_feature array<float>(4),
  short_feature array<smallint>(4),
  ANN INDEX idx_short_feature(short_feature),
  ANN INDEX idx_float_feature(float_feature),
  PRIMARY KEY (xid, cid, vid)
) DISTRIBUTED BY HASH(xid);

Example: add vector indexes to an existing table

If the table already exists without vector indexes, add them using ALTER TABLE:

-- Create the table first (without indexes)
CREATE TABLE vector (
  xid BIGINT not null,
  cid BIGINT not null,
  uid VARCHAR not null,
  vid VARCHAR not null,
  wid VARCHAR not null,
  float_feature array<FLOAT>(4),
  short_feature array<SMALLINT>(4),
  PRIMARY KEY (xid, cid, vid)
) DISTRIBUTED BY HASH(xid);

-- Add vector indexes
ALTER TABLE vector ADD ANN INDEX idx_float_feature(float_feature);
ALTER TABLE vector ADD ANN INDEX idx_short_feature(short_feature);

Query vector data

Use l2_distance(column, query_vector) in a SELECT statement to find similar entries. Order by the distance value and use LIMIT to return the top k results.

All examples below use the l2_distance function to query the short_feature column.

Insert sample data

INSERT INTO vector (xid, cid, uid, vid, wid, short_feature, float_feature) VALUES (1, 2, 'A', 'B', 'C', '[1,1,1,1]', '[1.2,1.5,2,3.0]');
INSERT INTO vector (xid, cid, uid, vid, wid, short_feature, float_feature) VALUES (2, 1, 'e', 'v', 'f', '[2,2,2,2]', '[1.5,1.15,2.2,2.7]');
INSERT INTO vector (xid, cid, uid, vid, wid, short_feature, float_feature) VALUES (0, 6, 'd', 'f', 'g', '[3,3,3,3]', '[0.2,1.6,5,3.7]');
INSERT INTO vector (xid, cid, uid, vid, wid, short_feature, float_feature) VALUES (5, 4, 'j', 'b', 'h', '[4,4,4,4]', '[1.0,4.15,6,2.9]');
INSERT INTO vector (xid, cid, uid, vid, wid, short_feature, float_feature) VALUES (8, 5, 'Sj', 'Hb', 'Dh', '[5,5,5,5]', '[1.3,4.5,6.9,5.2]');
INSERT INTO vector (xid, cid, uid, vid, wid, short_feature, float_feature) VALUES (5, 4, 'x', 'g', 'h', '[3,4,4,4]', '[1.0,4.15,6,2.9]');
INSERT INTO vector (xid, cid, uid, vid, wid, short_feature, float_feature) VALUES (5, 4, 'j', 'r', 'k', '[6,6,4,4]', '[1.0,4.15,6,2.9]');
INSERT INTO vector (xid, cid, uid, vid, wid, short_feature, float_feature) VALUES (5, 4, 's', 'i', 'q', '[2,2,4,4]', '[1.0,4.15,6,2.9]');

KNN query: top k nearest neighbors

Return the 3 entries closest to [1,1,1,1]:

SELECT xid, l2_distance(short_feature, '[1,1,1,1]') AS dis
FROM vector
ORDER BY dis
LIMIT 3;

Result:

+-------+------+
| xid   | dis  |
+-------+------+
| 1     | 0.0  |
+-------+------+
| 2     | 4.0  |
+-------+------+
| 0     | 16.0 |
+-------+------+

KNN query with a filter

Return the 4 entries closest to [1,1,1,1] where xid = 5 and cid = 4:

SELECT uid, l2_distance(short_feature, '[1,1,1,1]') AS dis
FROM vector
WHERE xid = 5 AND cid = 4
ORDER BY dis
LIMIT 4;

Result:

+-------+------+
| uid   | dis  |
+-------+------+
| s     | 20.0 |
+-------+------+
| x     | 31.0 |
+-------+------+
| j     | 36.0 |
+-------+------+
| j     | 68.0 |
+-------+------+

RNN query: entries within a distance threshold

Return entries closest to [1,1,1,1] where the distance is less than 50 and xid = 5, limited to 3 results:

SELECT uid, l2_distance(short_feature, '[1,1,1,1]') AS dis
FROM vector
WHERE l2_distance(short_feature, '[1,1,1,1]') < 50.0 AND xid = 5
ORDER BY dis
LIMIT 3;

Result:

+-------+------+
| uid   | dis  |
+-------+------+
| s     | 20.0 |
+-------+------+
| x     | 31.0 |
+-------+------+
| j     | 36.0 |
+-------+------+

删除向量索引

语法

仅3.2.1.0及以上版本支持删除向量索引。若表中存在向量索引,您可以使用以下语句删除。定义为:

ALTER TABLE table_name DROP [ANN] INDEX index_name;

示例

删除vector表中的向量索引。

ALTER TABLE vector DROP ann INDEX idx_short_feature;
ALTER TABLE vector DROP ann INDEX idx_float_feature;