All Products
Search
Document Center

OpenSearch:Vector indexes

Last Updated:Mar 21, 2025

This topic describes the vector models that are supported by OpenSearch Vector Search Edition.

Vector-based retrieval

In the information age, information may consist of multiple modals, such as text, image, video, and audio. Multimodal information can present more information than text, such as color, shape, motion state, sound, and spatial relationship.

image.png

The modals of information in various fields significantly change.

image.png

Multimodal information is classified into two categories: structured data and unstructured data.

Unstructured data is difficult for computers to understand. The traditional query method by analyzing text terms cannot meet the search requirements of various fields. To resolve this issue, OpenSearch Vector Search Edition provides the vector search feature.

Unstructured data generated in the physical world can be transformed into structured multidimensional vectors to identify relationships between entities.

Then, the distance between the vectors is calculated. In most cases, the closer the distance, the higher the vector similarity. The top N results with the highest vector similarity are retrieved. This implements vector-based retrieval.

image.png

Vector retrieval algorithms

Linear algorithm

The Linear algorithm linearly processes all vector data.

Scenario: You can use this algorithm if you require a recall rate of 100%.

Disadvantage: This algorithm delivers low efficiency and requires high CPU utilization and memory usage when it processes large volumes of data.

Clustering algorithms

Quantized clustering

Overview

image.png

Quantized clustering is a vector retrieval algorithm developed by Alibaba Cloud based on K-means clustering. This algorithm seeks n centroids of the vector documents and clusters each document to the centroid closest to the document. This way, n posting lists are built. The n centroids and the corresponding posting lists are the main content of quantized clustering indexing. During the retrieval, OpenSearch calculates the distance between several centroids and the request vector and selects multiple centroids closest to the request vector. Then, OpenSearch calculates the vector distance for the documents within the posting lists that correspond to the selected centroids and returns the top K documents closest to the request vector.

Quantized clustering supports the FP16 quantization and INT8 quantization features, which can reduce the index size and improve the retrieval performance. However, the features may affect the retrieval effect. You can enable or disable the features based on your business requirements.

The quantized clustering algorithm is suitable for real-time data scenarios or scenarios in which you use GPU resources and require low latency.

Parameter tuning:

For more information, see Quantized clustering configurations.

The following figure shows the performance of different vector algorithms to process different volumes of data.

image.png

HNSW

Overview

image.png

Hierarchical Navigable Small World (HNSW) is a vector retrieval algorithm based on proximity graphs. This algorithm requires you to create a proximity graph and build edges between vectors that are close to each other. The vectors and the edges are the main content of HNSW indexing. During the retrieval, OpenSearch starts traversal from the entry vector, calculates the distance between the request vector and each adjacent vector of the entry vector, and then selects the nearest adjacent vector as the next vector for traversal. OpenSearch traverses vectors based on the preceding rule until convergence appears. Convergence indicates that OpenSearch finds an adjacent vector of the current vector closer to the request vector than all the other retrieved adjacent vectors.

To accelerate convergence, HNSW builds a multi-layer proximity graph structure based on the logic of skip list query. The retrieval proceeds from top to bottom. The vector retrieval logic of each layer is similar except for Layer 0. If OpenSearch traverses to a vector in Layer K when the convergence appears, the vector is considered the entry vector of Layer K-1, and OpenSearch continues to traverse vectors at Layer K-1. Compared to Layer K-1, Layer K contains sparser nodes between which the distance is longer. In this case, OpenSearch traverses vectors at Layer K with a larger step size and a higher iteration speed than at Layer K-1.

Parameter tuning:

For more information, see HNSW configuration.

QGraph

Overview

Quantized Graph (QGraph) is an improved algorithm that is developed by the OpenSearch team based on the HNSW algorithm. QGraph automatically quantifies raw data and builds a graph index. Compared with the HNSW algorithm, the QGraph algorithm can effectively reduce index sizes and memory consumption. It can reduce the size of an index to up to 1/8 of the original size. In addition, with the optimization of CPU instructions for integer computing, the performance delivered by QGraph is several times higher than that delivered by HNSW. After quantization, the discrimination of vectors is reduced, and the recall rate of QGraph is lower than that of HNSW. In real scenarios, more recalls can be used to reduce this impact.

QGraph supports INT4 quantization, INT8 quantization, and INT16 quantization of data. Different quantization levels have different impacts on memory consumption, query performance, and recall rate. In most cases, the fewer the digits in an integer, the less the memory occupied by the vector index, the higher the query performance, and the lower the recall rate. Due to the underlying CPU instruction set, INT16 quantization delivers negligible improvements in query performance and the recall rate.

Parameter tuning:

For more information, see QGraph configuration.

CAGRA

Overview

CAGRA is a vector retrieval algorithm based on the nearest neighbor graph. CAGRA optimizes NVIDIA GPUs and is suitable for scenarios with high concurrency and batch requests.

Unlike HNSW, CAGRA builds a single-layer neighbor graph for parallel computing. During the retrieval, the algorithm maintains an ordered set of candidate vectors along with iterations. In each iteration, the algorithm selects the most appropriate top K vectors from the set as the starting points for retrieval and attempts to update the set with the neighbor vectors. This process iterates until all top K vectors in the set have been used as the starting points of retrieval. In this case, the algorithm is considered to be converged.

The CAGRA algorithm is implemented based on two policies: single-CTA and multi-CTA. Single-CTA uses a single thread block to implement the retrieval of a single vector, which is more efficient when the batch size is large. Multi-CTA uses multiple thread blocks to implement the retrieval of a single vector, which can utilize the parallel computing capability of GPUs and achieve a higher recall rate when the batch size is small. However, the throughput of multi-CTA is lower than that of single-CTA when the batch size is large. During actual calls, the system automatically selects the optimal policy based on the workload.

Parameter tuning:

For more information, see CAGRA configuration.

Vector distance types

The process of vector-based retrieval is to calculate the similarity between vectors and return the top K vectors based on the similarity. The similarity between vectors is obtained by calculating the distance between vectors. In most cases, the smaller the similarity score, the smaller the vector distance.

image.png

In different vector spaces, different distance metrics are defined to calculate the distance between vectors. OpenSearch supports the following distance metrics:

Squared Euclidean distance

image.png

Squared Euclidean distance refers to the distance between two vectors on the plane. As one of the most common distance metrics, the squared Euclidean distance can be obtained by calculating the square root of the sum of the squares of the coordinate differences between two vectors. The smaller the squared Euclidean distance, the higher the vector similarity. The squared Euclidean distance is calculated based on the following formula:

image.png

Inner product

The inner product is the dot product or scalar product between two vectors. The larger the inner product, the higher the vector similarity. The inner product can be calculated by multiplying the corresponding elements of the two vectors and summing the products. This distance metric is commonly used in search recommendation scenarios. In most cases, whether to calculate the inner product depends on whether the algorithm has an inner product model. The inner product is calculated based on the following formula:

image.png

Cosine distance

The cosine distance is the cosine of the angle between two vectors. The cosine distance ranges from -1 to 1. A value of -1 indicates that the two vectors are in opposite directions and have the lowest similarity. A value of 1 indicates that the two vectors are in the same direction and have the highest similarity. A value of 0 indicates that the two vectors are perpendicular. If the cosine distance between two vectors is closer 1, the two vectors are more similar. This distance metric is commonly used in similarity-based retrieval scenarios. The cosine formula is calculated by using the following formula:

image

Comparison among vector retrieval algorithms

Vector retrieval algorithm

Advantage

Disadvantage

Scenario

Quantized clustering

Low CPU utilization and memory usage.

  • Lower recall rate than HNSW.

  • Lower query speed than HNSW.

This algorithm is suitable for scenarios in which hundreds of millions of data records need to be processed and high data accuracy and low query latency are not required.

HNSW

High recall rate and high query speed.

High CPU utilization and memory usage.

This algorithm is suitable for scenarios in which tens of millions of data records need to be processed and high data accuracy and low query latency are required.

Linear

Recall rate of 100%.

  • Low query speed.

  • Higher CPU utilization and memory usage than the quantized clustering and HNSW algorithms.

This algorithm is suitable for scenarios in which tens of thousands of data records need to be processed.

QGraph

Low CPU utilization, low memory usage, high query speed, and high query performance.

Lower recall rate than HNSW.

This algorithm is suitable for scenarios in which billions of data records need to be processed, high query speed and performance are required, and high accuracy is not required.

CAGRA

GPU-based algorithm, which provides several times or even dozens of times higher query performance than CPU-based algorithms.

High costs, and low cost-effectiveness in scenarios with low queries per second (QPS).

This algorithm is suitable for scenarios in which high QPS and low time consumption are required.