AliSQL Vector Technology Analysis (2): Read/Write Cache and Transaction Concurrency

By Shaomeng

Introduction

The previous article "AliSQL Vector Technology Analysis (1)" introduced the storage format of vector indexes, the implementation of the Hierarchical Navigable Small World (HNSW) algorithm, and the data dictionary adaptation scheme. This helps readers understand the core implementation of vector indexes.

Based on AliSQL 8.0 version 20251031, this topic introduces a series of optimization policies. It introduces the memory-resident Nodes Cache to accelerate vector search efficiency. Based on this cache structure, it implements read/write concurrency control and Read Committed (RC) transaction isolation. These features ensure the reliability and performance of vector operations, allowing vector capabilities to meet production-level requirements.

Nodes Cache

As shown in the following figure, AliSQL introduces the shared cache (MHNSW Share) and transaction cache (MHNSW Trx) for vector data. These caches are used to accelerate vector query performance and ensure the transaction safety of vector updates, achieving a balance between resource isolation and performance optimization.

The shared cache and transaction cache are accessed by different operations and have different design targets:

• [Shared Cache] MHNSW Share is accessed by read-only transactions and is mounted on the TABLE_SHARE of the auxiliary table. Its core target is to reduce the overhead of repeatedly loading vector nodes by using the shared cache, thereby improving query efficiency.

• [Transaction Cache] MHNSW Trx inherits from MHNSW Share and is used by read/write transactions. It is mounted on the thd_set_ha_data of the session. Each read/write transaction creates an independent MHNSW Trx instance. It caches the nodes accessed by the transaction, including the nodes modified by the transaction. This avoids polluting the shared cache. The shared cache is updated only when the transaction is committed.

Transaction Isolation

AliSQL currently supports the RC (Read Committed) isolation level for vector reads and writes. This is achieved by distinguishing the access caches and commit flows of read/write transactions and read-only transactions:

[Read-only Transaction] Executes the HNSW query algorithm and prioritizes accessing the shared cache MHNSW Share. Only when the accessed node is not in the cache, the system loads node information that meets the RC (Read Committed) visibility from the InnoDB engine. When multiple read-only transactions access the same vector node multiple times, the node information needs to be loaded from the InnoDB engine only once. This effectively improves vector query performance.

[Read/Write Transaction] When an insertion is performed, a session-level transaction cache MHNSW Trx is constructed. The insertion process can be divided into several stages:

1. Read operation: Based on transaction visibility, the required node information is loaded from InnoDB. The HNSW insertion algorithm is executed in the transaction cache to determine the neighbor information of the newly inserted node at each layer, as well as the neighbor information of these neighbors.

2. Write operation: The newly inserted node and the nodes with updated neighbor information are saved to the InnoDB engine.

3. Commit or rollback

• Commit: Updates the version number of the shared cache and evicts all nodes modified by this write operation (expired nodes) from the shared cache. When other read-only transactions access these modified nodes, they must reload the latest node information from the InnoDB engine.

• Rollback: Directly discards the transaction cache and relies on the rollback mechanism of the InnoDB engine to recover data.

Concurrency Control

AliSQL designs a reasonable lock mechanism within and between caches. Currently, it supports concurrency between read-read and read-write operations, but does not support write-write concurrency on the same vector table. The concurrency control mechanism maintains the atomicity and visibility of the cache status during multi-threaded access, ensuring data consistency in high concurrency scenarios.

Read-Read Concurrency

The cache mutex (cache_lock) and node lock (lock_node) are combined to ensure concurrency safety among multiple read requests. As shown in the following figure, when a read-only request accesses a node,

First, the system searches for this node in the shared cache based on the node ID. The shared cache is a node cache based on a hash table. The read and write operations of the hash table are protected by the cache mutex (cache_lock). If the node does not exist in the shared cache, an empty node is created and added to the shared cache.

If the thread that obtains the node discovers that the node is empty, it needs to load the node from the InnoDB engine. The node lock (lock_node) is used to ensure that only one thread requests node information from the InnoDB engine.

Read-Write Concurrency

The commit read/write lock (commit rwlock) is used to implement concurrency safety between read requests and write requests. As shown in the following figure, a read request holds the commit read lock (commit rdlock) throughout the process. During the execution of the insertion algorithm, a write request only operates on the transaction cache created by the thread. Until the transaction is committed, it requests the commit write lock (commit wrlock) of the shared cache to evict expired nodes from the shared cache. In summary, the commit read/write lock (commit rwlock) ensures read/write concurrency safety between read requests and write commits.

Vector Compute Optimization

In high-dimensional data retrieval scenarios of vector databases, the calculation efficiency of vector distance directly determines query performance. To address this bottleneck, AliSQL implements significant performance optimization through pre-calculation policies and single instruction multiple data (SIMD) instruction set acceleration, balancing calculation efficiency and cache consistency.

Pre-calculation Policy

During the node cache load phase, the system pre-calculates vector distances and caches the results to avoid repeated calculations for high-frequency access nodes. For example, for nodes that are frequently involved in queries, the system performs versioning through the version field in the FVectorNode structure. When node data has not changed, the system directly reuses the pre-calculation results. If a data update causes a version change, the system triggers a recalculation. This mechanism effectively reduces invalid calculations and reduces the query latency of high-frequency nodes by more than 40%.

SIMD Instruction Set Acceleration

In terms of calculation optimization, AliSQL utilizes the SIMD instruction sets (such as AVX512) of modern CPUs to accelerate vector distance calculations. Through Bloom filters, the system can perform batch processing on multiple vectors, transforming scalar operations that originally required multiple executions into parallelized vector operations. This optimization significantly reduces the consumption of CPU instruction cycles.

Actual testing shows that the vector distance calculation performance of a single node improved by over 75%. In a vector dataset with a size of 10 million, SIMD optimization increased query throughput by more than 3 times, fully validating the value of hardware acceleration in large-scale data processing.

The pre-calculation policy and SIMD acceleration do not exist in isolation. Instead, they form a complementary relationship. Pre-calculation reduces the latency of high-frequency queries through caching mechanisms, while SIMD acceleration optimizes the execution efficiency of single calculations. The combination of the two improves the overall efficiency of vector operations.

Summary

Through the collaborative design of public cache and transaction cache, AliSQL achieves efficient caching and transaction isolation for vector indexes, ensuring data consistency and query performance in high concurrency scenarios. Currently, the system supports read-read concurrency and read-write concurrency for vector data, covering mainstream vector operation scenarios. The locking policy ensures concurrency safety. Finally, the pre-calculation policy and SIMD acceleration increase the degree of concurrency and speed of vector calculations, further improving the overall efficiency of vector operations.

This article describes the principles of AliSQL vector index support for transactions and concurrency control. However, defects in write-write concurrency capabilities mean that building vector indexes cannot be performed concurrently, and the table-level nodes cache cannot effectively perform memory management for all vector indexes of the entire instance. The next article, "AliSQL Vector Technology Analysis (III): Index Concurrent Build and Global Cache Management," will introduce the optimization policies of AliSQL for index building and the management schemes for global cache.

Community

AliSQL Vector Technology Analysis (2): Read/Write Cache and Transaction Concurrency

Introduction

Nodes Cache

Transaction Isolation

Concurrency Control

Read-Read Concurrency

Read-Write Concurrency

Vector Compute Optimization

Pre-calculation Policy

SIMD Instruction Set Acceleration

Summary

Read previous post:

Read next post:

ApsaraDB

You may also like

Comments

ApsaraDB

Related Products

Database for FinTech Solution

Oracle Database Migration Solution

Database Migration Solution

DBStack