All Products
Search
Document Center

OpenSearch:Vector indexes

Last Updated:Sep 18, 2025

This topic describes the vector index types supported by OpenSearch Vector Search Edition.

Vector retrieval overview

In the information age, data exists in multiple modalities, such as text, images, videos, and audio. This multimodal data can convey details that text cannot, such as color, shape, motion, sound, and spatial relationships.

image.png

The modalities of information are also diversifying across various fields.

image.png

In multimodal scenarios, information is classified into two categories: structured and unstructured.

Unstructured data is often difficult for computers to understand. Traditional retrieval methods that rely on text tokenization cannot meet the search demands in various fields. Vector search is an ideal solution to this problem.

So, what are vectors, and how does vector retrieval work?

Unstructured data from the physical world is transformed into multi-dimensional vectors. These vectors represent entities and the relationships between them.

The distance between vectors is then calculated. Typically, a smaller distance indicates higher similarity. The system retrieves the most similar results to complete the search.

image.png

Algorithms

linear

The linear algorithm linearly computes the distance for all vector data.

Scenario: Use this algorithm if you require a 100% recall rate.

Disadvantages: This algorithm is inefficient and consumes significant resources, such as CPU and memory, when processing large amounts of data.

Clustering

Quantized Clustering

Overview:

image.png

Quantized Clustering is a vector retrieval algorithm developed by Alibaba based on k-means clustering. The algorithm first clusters vector documents into n centroids. It then assigns each document to its nearest centroid, building n inverted lists. These n centroids and their corresponding inverted lists constitute the main content of the Quantized Clustering (QC) index. During retrieval, a query first calculates its distance to a subset of the centroids and selects the closest ones. Then, the query calculates its distance to the documents in the inverted lists of the selected centroids. Finally, it returns the top-k documents closest to the query.

Notably, QC supports fp16 and int8 numerical quantization, which can reduce the index size and improve retrieval performance. However, this may slightly reduce the recall rate. You can configure this feature based on your requirements.

The QC algorithm is suitable for real-time data scenarios or scenarios that have GPU resources and require low latency.

Parameter tuning:

Quantized clustering configuration

Performance comparison:

image.png

HNSW

Overview:

image.png

Hierarchical Navigable Small World (HNSW) is a vector retrieval algorithm based on proximity graphs. It first builds a proximity graph where edges are created between vectors that are close to each other. The vectors and their proximity information constitute the main content of the HNSW index. During retrieval, the search starts from an entry node. It calculates the distances between the query and all neighbors of the entry node, then selects the nearest neighbor as the next node to traverse. This process iterates until the search converges. Convergence occurs when none of the neighbors of the current node are closer to the query than the nearest node already found.

To speed up convergence, HNSW builds a multilayer proximity graph structure, which is similar to the logic of skip list queries. The retrieval process starts from the top layer and moves down. The retrieval logic is similar for each layer, with minor differences in layer 0. After the traversal converges to a node in layer k, that node becomes the entry point for layer k-1, and the traversal continues. Layer k has sparser nodes with greater distances between them than layer k-1. This allows for a larger step size and faster iteration during traversal in layer k.

Parameter tuning:

HNSW configuration

QGraph

Overview:

Quantized Graph (QGraph) is an improved algorithm developed by OpenSearch based on HNSW. During index building, it automatically quantizes the raw data before building the graph index. Compared to HNSW, QGraph can significantly reduce the index size and memory overhead, reducing the index to as small as one-eighth of its original size. Additionally, using CPU instruction optimizations for integer calculations, the performance of QGraph is several times better than that of HNSW. After quantization, the distinctiveness of vectors is reduced, which causes the recall rate of QGraph to be lower than that of HNSW. In real-world scenarios, this effect can be mitigated by retrieving more results.

QGraph supports int4, int8, and int16 quantization for data. Different quantization levels affect vector index memory usage, query performance, and recall rate differently. Generally, a smaller integer bit width results in a smaller memory footprint for the vector index, higher performance, and a lower recall rate. Due to the nature of the underlying CPU instruction set, int16 quantization has nearly the same performance and recall rate as non-quantized data.

Parameter tuning:

QGraph configuration

CAGRA

Overview:

CUDA ANN GRAph-based (CAGRA) is a vector retrieval algorithm based on proximity graphs. It is optimized for NVIDIA GPUs and is suitable for scenarios with high concurrency and batch requests.

Unlike HNSW, CAGRA builds only a single-layer proximity graph to facilitate parallel computing. During a query, the algorithm iteratively maintains an ordered candidate set. In each iteration, it selects the top-k nodes from the candidate set as search starting points and attempts to update the set with their neighbors. This process continues until all top-k nodes in the candidate set have been used as a search starting point, at which point the algorithm converges.

The CAGRA algorithm is implemented with two policies: single-CTA and multi-CTA. The single-CTA policy maps a single vector query to a single thread block and is efficient for large batch sizes. The multi-CTA policy uses multiple thread blocks to execute a single vector query. This policy better utilizes the GPU's parallel computing power and achieves a higher recall rate when the batch size is small. However, its throughput is lower than that of the single-CTA policy when the batch size is large. The system automatically selects the optimal policy based on the workload during runtime.

Parameter tuning:

CAGRA configuration

DiskANN

Overview:

DiskANN is a disk-based approximate nearest neighbor (ANN) search technique designed for handling extremely large datasets. It uses the Vamana graph algorithm to efficiently use disk storage for datasets that exceed available memory, while maintaining high-performance vector indexing and retrieval.

Note

The DiskANN algorithm is supported only when the data node family is SSD.

Parameter tuning:

DiskANN configuration

CagraHnsw

Overview:

CagraHnsw is a vector retrieval algorithm developed by OpenSearch. It combines the advantages of the CAGRA and HNSW algorithms and is suitable for scenarios that require rapid index building for massive datasets.

The CagraHnsw algorithm is similar to CAGRA during the index building phase. It uses GPUs to build indexes and converts the index files into a format compatible with the HNSW algorithm. During the retrieval phase, the HNSW algorithm on the CPU performs the search. This strategy combines the efficient GPU-based index building of CAGRA with cost-effective CPU-based retrieval. This approach avoids the need for expensive GPU instances during retrieval. As a result, CagraHnsw is more cost-effective than CAGRA in scenarios that do not require sustained high-concurrency queries.

Vector distance types

Vector retrieval works by calculating the similarity between vectors and returning the top-k most similar vectors. In this process, similarity is determined by calculating distance. Typically, a smaller distance score indicates that the vectors are closer, and a larger score indicates that they are farther apart.

image.png

Different vector spaces use different distance metrics to calculate the distance between vectors. OpenSearch Vector Search Edition supports the following metrics: squared Euclidean distance, inner product, and cosine distance.

Squared Euclidean distance (SquareEuclidean)

image.png

Euclidean distance, one of the most common distance measures, represents the straight-line distance between two vectors. It is calculated by taking the square root of the sum of the squared differences between their coordinates. A smaller Euclidean distance indicates that the two vectors are more similar. The formula for the Euclidean distance is as follows:

image.png

Inner product distance (InnerProduct)

The inner product is the dot product or scalar product of two vectors. A larger inner product result indicates greater similarity between the vectors. It is calculated by multiplying the corresponding elements of the two vectors and then summing the products. The inner product metric is common in search and recommendation scenarios. The choice to use the inner product metric typically depends on whether the algorithm uses an inner product model. The formula for the inner product is as follows:

image.png

Cosine distance

Cosine distance measures the similarity between two vectors based on the cosine of the angle between them. The resulting value ranges from -1 to 1. A value of 1 indicates that the vectors point in the same direction, representing the highest similarity. A value of -1 indicates that they point in opposite directions, representing the lowest similarity. A value of 0 indicates that the vectors are perpendicular. The closer the value is to 1, the more similar the vectors are. This metric is often used in text similarity retrieval scenarios. The formula is as follows:

image