RDS MySQL vector storage - ApsaraDB RDS - Alibaba Cloud Documentation Center

RDS for MySQL provides deep integration for enterprise-level vector data processing. It natively supports storing and computing vector data of up to 16,383 dimensions. The service integrates mainstream vector operation functions and uses a highly optimized Hierarchical Navigable Small World (HNSW) algorithm to deliver efficient approximate nearest neighbor searches. This feature also supports creating indexes on full-dimension vector columns.

Feature description

RDS for MySQL natively supports vector data processing, including vector storage, similarity calculations, and high-performance indexing. It provides out-of-the-box vector solutions for scenarios such as large-scale semantic retrieval, AI-powered recommendations, and multi-modal analysis. You can use standard SQL interfaces to seamlessly combine high-precision vector matching with complex business logic. This allows businesses to quickly implement innovative AI applications using a low-cost, highly compatible architecture.

Efficient storage, access, and computation of high-dimensional vectors: You can store floating-point vector data of up to 16,383 dimensions using the new VECTOR data type. This feature is compatible with standard SQL interfaces for writing, updating, and managing vector data in batches. It supports the following mainstream vector processing functions:

Function name	Description
`VECTOR_DIM`	Gets the vector dimensions.
`VEC_FROMTEXT`	Converts a string to a vector.
`TO_VECTOR`
`STRING_TO_VECTOR`
`VEC_TOTEXT`	Converts a vector to a string.
`FROM_VECTOR`
`VECTOR_TO_STRING`
`VEC_DISTANCE`	Calculates the distance between two vectors. If one of the parameters is an indexed column, the distance type of the index is automatically detected.
`VEC_DISTANCE_EUCLIDEAN`
`VEC_DISTANCE_COSINE`

High-performance vector index: Vector indexes use a highly optimized HNSW algorithm. Technologies such as single instruction multiple data (SIMD) hardware acceleration, Bloom filter search pruning, and LIMIT condition pushdown significantly improve the retrieval efficiency of large-scale vector data. This feature also supports mixed storage and joint queries of vector and scalar data.
Open source ecosystem, ready to use: This feature is fully compatible with the MySQL protocol. It supports Java Database Connectivity (JDBC), Object-Relational Mapping (ORM) tools, and mainstream developer frameworks. It integrates with Alibaba Cloud services such as DTS and DMS to provide full lifecycle capabilities, including data synchronization, management, backup, and recovery. You can upgrade existing instances with a single click without creating new clusters.

Applicability

Database version: MySQL 8.0.
Minor engine version: 20251031 or later. If your current version does not meet this requirement, you must upgrade the minor engine version or the database major version.
The following limitations apply when you use this feature:
- Vector indexes are supported only on tables that use the InnoDB engine.
- The primary key of the table cannot exceed 256 bytes in length.
- The inplace syntax cannot be used to create, modify, or delete vector indexes.
- Vector indexes cannot be set to INVISIBLE.
- Tables with vector indexes do not support the Recycle Bin feature.
- Data modification and queries on vector indexes only support the Read Committed (RC) isolation level.
- Because the HNSW algorithm involves randomness, such as random levels and heuristic algorithms, the graph structures of vector indexes on the primary and standby instances are not guaranteed to be identical.
- If you use the vector data type in stored procedures or functions in the source database, synchronization or migration to a destination database that does not support vectors will fail.

Parameter management

Parameter descriptions

Parameter name	Description
`vidx_default_distance`	• Description: The default vector distance type. • Scope: Session. • Data type: `String`. • Default value: `EUCLIDEAN`. • Valid values: `EUCLIDEAN`: Euclidean distance. Calculates the straight-line distance (geometric distance) between two vectors in a multi-dimensional space. `COSINE`: Cosine distance. Calculates the cosine of the angle between two vectors to measure directional similarity, ignoring vector length.
`vidx_hnsw_default_m`	• Description: The default M value for an HNSW index (the maximum number of output nodes for each node in the graph). • Scope: Session. • Data type: `Integer`. • Default value: `6`. • Valid range: `[3, 200]`.
`vidx_hnsw_ef_search`	• Description: The default ef_search value (search scope) for HNSW index queries. • Scope: Session. • Data type: `Integer`. • Default value: `20`. • Valid range: `[1, 10000]`.
`vidx_hnsw_cache_size`	• Description: The maximum memory that the HNSW index cache can use. Unit: bytes. • Scope: Global. • Data type: `BigInt`. • Default value: `1048576`. • Valid range: `[1048576, 18446744073709551615]`.

Modify parameters

Go to the Instances page. In the top navigation bar, select the region in which the RDS instance resides. Then, find the RDS instance and click the ID of the instance.
In the left navigation pane, click Parameter Settings.
On the Editable Parameters tab, search for the parameter to modify and set its value.
Click OK, and then click Submit Parameters. In the dialog box that appears, select when to apply the changes.

Note

All vector-related parameters are dynamic. Modifications take effect immediately without restarting the instance.

Enable and use the feature

Note

Enabling or disabling the vector feature does not require an instance restart.

Step 1: Enable vector support

Go to the RDS console, select the destination region, and click the instance ID.
On the Basic Information page, in the Running Status section, for Vector Storage, click Enable on the right.
When the status is Enabled, the feature takes effect immediately.

Step 2: Create a table and a vector index

-- Create a table that contains a 5-dimension vector column and an HNSW index
CREATE TABLE product_embeddings (
  id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  product_name VARCHAR(255),
  embedding VECTOR(5) NOT NULL,
  -- Create a vector index, and specify the M value and the distance calculation method
  VECTOR INDEX idx_embedding(embedding) M=16 DISTANCE=COSINE
);

Step 3: Insert data

-- Use the VEC_FROMTEXT function to insert vector data
INSERT INTO product_embeddings (product_name, embedding) VALUES
('product_A', VEC_FROMTEXT('[0.1, 0.2, 0.3, 0.4, 0.5]')),
('product_B', VEC_FROMTEXT('[0.6, 0.7, 0.8, 0.9, 1.0]')),
('product_C', VEC_FROMTEXT('[0.11, 0.22, 0.33, 0.44, 0.55]'));

Step 4: Perform a vector similarity query

-- Find the 2 products most similar to the given vector '[0.1, 0.2, 0.3, 0.4, 0.51]'
SELECT
  id,
  product_name,
  VEC_DISTANCE(embedding, VEC_FROMTEXT('[0.1, 0.2, 0.3, 0.4, 0.51]')) AS similarity_score
FROM
  product_embeddings
ORDER BY
  similarity_score ASC -- The smaller the COSINE distance, the more similar the vectors are
LIMIT 2;