The vector search feature in PolarSearch uses a REST API for efficient similarity searches on unstructured data, such as text and images. This allows you to quickly and accurately find the closest matches in massive datasets, making your application more intelligent.
Overview
Vector retrieval, also known as similarity search, is a technique for finding the most similar data by comparing vector distances. This method is fundamentally different from traditional search, which relies on exact keyword matching.
The core idea is to convert unstructured data, such as text, images, and audio, into numerical representations called vector embeddings using deep learning models like Large Language Models (LLMs). These multi-dimensional vectors capture the data's deep semantic information.
When you submit a query, PolarSearch converts the query content into a vector and performs a k-nearest neighbor (k-NN) search. This core algorithm finds the k vectors within a massive dataset that are closest to your query vector. Here, k is a number you specify (for example, setting k=5 means finding the five most similar results). PolarSearch returns these k most similar results.
To achieve efficient retrieval, PolarSearch relies on two core components: vector index and vector storage optimization.
Vector index: To avoid an exhaustive scan and computation across the entire dataset, you must build a vector index in advance. An index organizes vector data into a query-optimized data structure. During a query, this structure significantly narrows the search space, dramatically improving retrieval performance. PolarSearch supports multiple types of vector indexes. This guide focuses on two industry-leading options: HNSW and IVF.
HNSW (Hierarchical Navigable Small World): A graph-based index known for its high performance and high recall, but with a large memory overhead. It is ideal for scenarios that require extremely low query latency and high precision, where the entire dataset can fit into memory.
IVF (Inverted File): A cluster-based inverted index with low memory consumption. It is ideal for processing large-scale datasets in memory-constrained environments, although its search precision is typically slightly lower than HNSW.
Vector storage optimization: High-dimensional vector data consumes significant memory and storage space. PolarSearch provides multiple optimization techniques to reduce resource consumption.
Vector quantization: A technique that compresses data by reducing the numerical precision of vectors, significantly decreasing storage space. It strikes a balance between the compression ratio and search precision. PolarSearch supports product quantization (PQ), scalar quantization (SQ), and binary quantization (BQ).
Disk-based storage: For low-memory environments, parts of the vector index can be stored on disk. This lets the vector retrieval service use less memory at the cost of slightly higher query latency.
Notes
Keep the following in mind when using the PolarSearch vector search feature:
Index training requirements:
IVFindexes and thePQ(product quantization) technique require a separate training step before use. You must provide a set of representative vector data to train the model. Otherwise, the index will not work properly.Memory overhead:
HNSWindexes offer excellent performance, but their graph structure must be fully loaded into memory, which causes high memory overhead. Evaluate the memory resources of your cluster before you select this index type.Performance and cost trade-off: Disk-based vector search can increase query latency. Evaluate if this trade-off is acceptable for your application.
Automatic training: Binary quantization (BQ) trains automatically during index creation. No user action is required.
Procedure
Before you begin
To use the REST API for vector search, you must enable PolarSearch. See PolarSearch User Guide for instructions on enabling PolarSearch for new or existing clusters.
Step 1: Create a vector index
To store and search vectors, you must create an index with a specific configuration. This involves two key steps:
Enable k-NN and define a vector field: In the index settings (
settings), set theknnparameter totrue. This is a master switch that informs PolarDB that the index will be used for vector search.Key parameters
engine: Set to faiss.NoteFaiss (Facebook AI Similarity Search) is a high-performance, open-source library developed by Meta AI for efficient similarity search and clustering of massive vector datasets. PolarSearch uses Faiss as its core vector search engine.
dimension: Specifies the dimension of the vector. This value must be identical to the dimension of the vectors that are output by your model.data_type: Specifies the data type of the vector. The default value isfloat. You can also selectbyteorbinaryto optimize storage.space_type: Specifies the distance metric used to calculate vector similarity. The supported values are as follows:space_typeDistance metric
Description
l2L2 (Euclidean distance)
Calculates the square root of the sum of squared differences. Sensitive to value magnitude.
l1L1 (Manhattan distance)
Calculates the sum of the absolute differences between vector dimensions.
cosinesimilcosine similarity
Measures the angle between vectors, focusing on direction rather than magnitude.
innerproductinner product
Calculates the dot product of vectors. Often used for ranking.
hammingHamming distance
Calculates the number of differing bits in binary vectors.
chebyshevL∞ (Chebyshev distance)
Considers only the maximum absolute difference between vector dimensions.
Define a vector field (HNSW or IVF): In the index's mappings (
mappings), you must define a field of theknn_vectortype. This field is used specifically to store vector data, and is where you configure the vector dimension, similarity calculation method, and the core indexing method.Selection guidance
HNSW and IVF offer different trade-offs in performance, memory consumption, and recall, making them suitable for different use cases. Use the following table to choose an indexing method.
Comparison dimension
HNSW
IVF
Query latency
Extremely low. The hierarchical graph structure enables fast candidate location with short search paths.
Low. It first locates a cluster and then searches within it, which creates a longer search path.
Recall
High. Better graph connectivity reduces the chance of missing nearest neighbors.
Medium to high. An edge effect (when a query point is on the boundary of a cluster) may cause a loss in precision. This can be mitigated by adjusting the
nprobesparameter.Memory consumption
High. Requires loading the entire graph structure into memory.
Low. Mainly stores cluster centers and inverted lists, resulting in much lower memory overhead than HNSW.
Build time
Long. Building a high-quality graph structure requires complex computations.
Fast. However, it requires an additional training step to generate cluster centers.
Use Cases
Scenarios that require the highest query performance and recall, and have sufficient memory resources. Examples: real-time semantic search, face recognition.
Cost-sensitive scenarios with massive datasets, limited memory, and where a minor loss in recall is acceptable. Examples: large-scale product recommendation, image library retrieval.
Examples
HNSW
HNSW is implemented through IndexHNSWFlat and is suitable for scenarios that require high performance and recall.
Key parameters
Parameter | Value | Description |
| Positive integer. | The maximum number of neighbors (out-degree) for each node in the graph. This value determines the graph's density and is the most critical parameter affecting index quality and memory consumption.
|
| Must be a positive integer and should usually be greater than | The size of the dynamic candidate list during index construction. It controls the search depth and breadth when building the graph. This value primarily affects the build time and the final quality of the index.
|
| Positive integer. | The size of the dynamic candidate list at query time. It controls the search depth during a query. Note This parameter is not specified when an index is created, but is set globally at query time or in the index's
|
When you create an HNSW index, replace <my-index> with your index name and <my_vector_field> with your field name. In addition, configure other core parameters such as dimension, data_type, space_type, m, and ef_construction based on your actual business requirements.
REST API
// HNSW index creation example. Replace <my-index> with your actual index name.
PUT /<my-index>
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"<my_vector_field>": {// Replace <my_vector_field> with your actual field name.
"type": "knn_vector",
"dimension": 4,
"data_type": "float",
"method": {
"name": "hnsw",
"engine": "faiss",
"space_type": "l2",
"parameters": {
"m": 16,
"ef_construction": 256
}
}
}
}
}
}Java client
private static void createVectorIndex(OpenSearchClient client) throws IOException {
Property vectorProperty = Property.of(p -> p.knnVector(
KnnVectorProperty.of(kvp -> kvp
.dimension(128)
.dataType("float")
.method(new KnnVectorMethod.Builder()
.name("hnsw")
.engine("faiss")
.spaceType("l2")
.parameters(Map.of(
"m", JsonData.of(16),
"ef_construction", JsonData.of(256)
))
.build()
)
)
));
TypeMapping mapping = TypeMapping.of(m -> m
.properties("<my_vector_field>", vectorProperty)
.properties("text", Property.of(p -> p.text(TextProperty.of(t -> t))))
.properties("category", Property.of(p -> p.keyword(k -> k)))
);
CreateIndexRequest request = new CreateIndexRequest.Builder()
.index(<my-index>)
.settings(s -> s.knn(true))
.mappings(mapping)
.build();
client.indices().create(request);
}IVF
IVF is implemented by IndexIVFFlat and is suitable for very large-scale datasets in memory-constrained scenarios.
Key parameters
Parameter | Value | Description |
| Positive integer. | The number of cluster centers. The index divides the entire vector space into
|
| A positive integer, which should typically be less than | The number of cluster centers (clusters) to search at query time. This is the most direct parameter for trading query speed for recall.
|
When you create an IVF index, replace <my-index> with your index name and <my_vector_field> with your field name. Also, configure other core parameters such as dimension, data_type, space_type, nlist, and nprobes based on your business requirements.
// IVF index creation example. Replace <my-index> with your actual index name.
PUT /<my-index>
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"<my_vector_field>": {// Replace <my_vector_field> with your actual field name.
"type": "knn_vector",
"dimension": 4,
"data_type": "float",
"method": {
"name": "ivf",
"engine": "faiss",
"space_type": "l2",
"parameters": {
"nlist": 1024,
"nprobes": 10 // nprobes is typically specified at query time. This is just an example.
}
}
}
}
}
}Step 2: Index vector data
Prepare your documents, including vector data and other metadata, and add them to the index you just created.
REST API
POST /_bulk
{ "index": { "_index": "my-index", "_id": "doc_1" } }
{ "my_vector_field": [5.2, 4.4, 0.0, 0.0] }
{ "index": { "_index": "my-index", "_id": "doc_2" } }
{ "my_vector_field": [5.2, 3.9, 0.0, 0.0] }
{ "index": { "_index": "my-index", "_id": "doc_3" } }
{ "my_vector_field": [4.9, 3.4, 0.0, 0.0] }
{ "index": { "_index": "my-index", "_id": "doc_4" } }
{ "my_vector_field": [4.2, 4.6, 0.0, 0.0] }
{ "index": { "_index": "my-index", "_id": "doc_5" } }
{ "my_vector_field": [3.3, 4.5, 0.0, 0.0] }Java client
private static void indexSampleData(OpenSearchClient client) throws IOException {
List<Map<String, Object>> documents = new ArrayList<>();
documents.add(Map.of("text", "a book about data science", "category", "books", "<my_vector_field>", List.of(1.0f, 2.0f, 3.0f, 4.0f)));
documents.add(Map.of("text", "an intelligent smartphone with a great camera", "category", "electronics", "<my_vector_field>", List.of(8.0f, 7.0f, 6.0f, 5.0f)));
documents.add(Map.of("text", "a technical manual for a smart device", "category", "electronics", "<my_vector_field>", List.of(3.0f, 4.0f, 5.0f, 6.0f)));
for (int i = 0; i < documents.size(); i++) {
IndexRequest<Map<String, Object>> request = new IndexRequest.Builder<Map<String, Object>>()
.index(<my-index>)
.id("doc_" + i)
.document(documents.get(i))
.build();
client.index(request);
}
}Step 3: Perform vector search
You can send vector search requests to find the most similar results to your query vector within a massive dataset.
Basic k-NN search
This is the most basic vector search. It finds the k results with the smallest distance to the query vector across the entire index.
REST API
POST /<my-index>/_search
{
"size": 3,
"query": {
"knn": {
"<my_vector_field>": {
"vector": [3.1, 4.1, 5.1, 6.1],
"k": 3
}
}
}
}Java client
// Prepare your query vector.
List<Float> queryVector = List.of(3.1f, 4.1f, 5.1f, 6.1f);
private static void performBasicKnnSearch(OpenSearchClient client, List<Float> queryVector) throws IOException {
System.out.println("\n--- 1. Performing Basic k-NN Search ---");
System.out.println("Querying for vectors most similar to: " + queryVector);
// Find the 3 most similar results.
int k = 3;
KnnQuery knnQuery = new KnnQuery.Builder()
.field("<my_vector_field>")
.vector(queryVector)
.k(k)
.build();
SearchRequest searchRequest = new SearchRequest.Builder().index(<my-index>).query(new Query.Builder().knn(knnQuery).build()).size(k).build();
SearchResponse<Map> response = client.search(searchRequest, Map.class);
printResults(response);
}Hybrid search
In many scenarios, you need to narrow the search scope with one or more conditions before you perform a vector search. This is the core idea of hybrid search. You can use the KnnQuery in conjunction with the filter clause to achieve this. The filter itself can be any standard OpenSearch query, such as term (exact value matching) or match (full-text search).
Text match filter
This is useful for classic "keyword + vector" hybrid search scenarios. For example, you can first search for all documents that contain "new smartphone" in their description and then rank them by vector similarity.
REST API
POST /<my-index>/_search
{
"size": 3,
"query": {
"knn": {
"<my_vector_field>": {
"vector": [3.1, 4.1, 5.1, 6.1],
"k": 3,
"filter": {
"match": {
"text": "book"
}
}
}
}
}
}Java client
// Prepare your query vector.
List<Float> queryVector = List.of(3.1f, 4.1f, 5.1f, 6.1f);
private static void performHybridSearchWithText(OpenSearchClient client, List<Float> queryVector) throws IOException {
System.out.println("\n--- 2. Performing Hybrid Search (k-NN + Text Match) ---");
// Prepare your query keyword.
String textQuery = "book";
System.out.println("Filtering for documents containing '" + textQuery + "', then finding most similar vectors.");
// Find the 3 most similar results.
int k = 3;
MatchQuery matchQuery = new MatchQuery.Builder().field("text").query(q -> q.stringValue(textQuery)).build();
KnnQuery hybridKnnQuery = new KnnQuery.Builder()
.field("<my_vector_field>")
.vector(queryVector)
.k(k)
.filter(new Query.Builder().match(matchQuery).build())
.build();
SearchRequest searchRequest = new SearchRequest.Builder().index(<my-index>).query(new Query.Builder().knn(hybridKnnQuery).build()).size(k).build();
SearchResponse<Map> response = client.search(searchRequest, Map.class);
printResults(response);
}Term filter
This is useful for scenarios where you filter by a specific tag, category, or ID. For example, searching for the most similar products only within the "electronics" category.
REST API
POST /<my-index>/_search
{
"size": 3,
"query": {
"knn": {
"<my_vector_field>": {
"vector": [5.0, 4.0, 1.0, 2.0],
"k": 3,
"filter": {
"term": {
"category": "electronics"
}
}
}
}
}
}Java client
// Prepare your query vector.
List<Float> queryVector = List.of(3.1f, 4.1f, 5.1f, 6.1f);
private static void performFilteredSearchWithTerm(OpenSearchClient client, List<Float> queryVector) throws IOException {
System.out.println("\n--- 3. Performing Filtered Search (k-NN + Term Filter) ---");
// Prepare your query category.
String categoryFilter = "electronics";
System.out.println("Filtering for documents in category '" + categoryFilter + "', then finding most similar vectors.");
// Find the 3 most similar results.
int k = 3;
TermQuery termQuery = new TermQuery.Builder().field("category").value(v -> v.stringValue(categoryFilter)).build();
KnnQuery filteredKnnQuery = new KnnQuery.Builder()
.field("<my_vector_field>")
.vector(queryVector)
.k(k)
.filter(new Query.Builder().term(termQuery).build())
.build();
SearchRequest searchRequest = new SearchRequest.Builder().index(<my-index>).query(new Query.Builder().knn(filteredKnnQuery).build()).size(k).build();
SearchResponse<Map> response = client.search(searchRequest, Map.class);
printResults(response);
}Configure storage optimization
Vector data, especially high-dimensional floating-point vectors, consumes significant memory. PolarSearch offers storage optimization techniques that balance memory cost, query performance, and search accuracy by compressing (quantizing) vectors or changing the storage medium.
Recommendations
Before you choose an optimization strategy, refer to the following table to find the best option for your use case.
Optimization strategy | Compression ratio | Accuracy impact | Training required | CPU overhead | Use cases |
Scalar Quantization (SQ) | Low (fixed 2x) | Minimal | No | Low | Scenarios requiring extremely high search accuracy and moderate memory optimization with minimal accuracy loss. |
Binary Quantization (BQ) | High (8x–32x) | Significant | No | Medium | Extremely memory-sensitive scenarios that can tolerate significant accuracy loss to achieve maximum memory savings. |
Product Quantization (PQ) | Highest | Medium | Yes | Medium | Scenarios with massive datasets requiring the highest compression ratio, where investing time in model training is acceptable to balance accuracy and memory. |
Disk-based vector storage | - | Significant | No | High | Cost-sensitive scenarios with extremely limited memory resources where you can trade higher query latency (due to disk I/O) for the lowest possible memory consumption. |
Configuration
Scalar quantization (SQ)
How it works: This method converts standard 32-bit floating-point (float) vectors into 16-bit floating-point (fp16) vectors for storage, reducing the memory footprint by half. During distance calculations, vectors are decoded back to 32-bit, minimizing the impact on accuracy.
Memory estimation:
Formula:
Memory (GB) ≈ 1.1 * (2 * dimension + 8 * m) * num_vectors / 1024^3Parameters:
dimension: The dimension of the vector.m: Themparameter in the HNSW index, which is the maximum number of neighbors for each node.num_vectors: The total number of vectors.1.1: An approximate 10% overhead factor for system usage.
Example: Assume you have 1 million vectors, each with a dimension of 256, and the HNSW
mparameter is 16. The estimated memory requirement is:1.1 * (2 * 256 + 8 * 16) * 1,000,000 ~= 0.656 GB
Usage example:
// HNSW with Scalar Quantization (SQ) example PUT /<my-sq-index> { "settings": { "index": { "knn": true } }, "mappings": { "properties": { "<my_vector_field>": { "type": "knn_vector", "dimension": 128, "method": { "name": "hnsw", "engine": "faiss", "parameters": { "m": 16, "ef_construction": 256, "encoder": {// Enable SQ "name": "sq", "parameters": { "type": "fp16" } } } } } } } }
Binary quantization (BQ)
How it works: This method compresses each dimension of a floating-point vector into a binary bit (0 or 1) for storage, achieving a very high compression ratio. The training process is handled automatically during index creation.
Memory estimation:
Formula:
Memory (GB) ≈ 1.1 * ((dimension * bits / 8) + 8 * m) * num_vectors / 1024^3Parameters:
dimension: The dimension of the vector.bits: The number of binary bits used to represent each dimension. Valid values are 1, 2, and 4. A smaller value forbitsresults in a higher compression ratio but greater accuracy loss.m: Themparameter in the HNSW index.num_vectors: The total number of vectors.
Example: Assume you have 1 million vectors, each with a dimension of 256, and the HNSW
mparameter is 16. The estimated memory requirements for different compression levels are as follows:1-bit quantization (32x compression): Each dimension is represented by 1 bit. The estimated memory requirement is:
1.1 * ((256 * 1 / 8) + 8 * 16) * 1,000,000 ~= 0.176 GB2-bit quantization (16x compression): Each dimension is represented by 2 bits. The estimated memory requirement is:
1.1 * ((256 * 2 / 8) + 8 * 16) * 1,000,000 ~= 0.211 GB
Usage example:
// HNSW with Binary Quantization (BQ) example PUT /<my-bq-index> { "settings" : { "index": { "knn": true } }, "mappings": { "properties": { "<my_vector_field>": { "type": "knn_vector", "dimension": 128, "method": { "name": "hnsw", "engine": "faiss", "parameters": { "m": 16, "ef_construction": 512, "encoder": { "name": "binary", "parameters": {// Enable BQ with 1-bit quantization "bits": 1 } } } } } } } }
Product quantization (PQ)
Product quantization is an advanced vector compression technique that achieves a higher compression ratio than SQ or BQ. However, it requires a separate training step to build the compression model.
How it works:
Vector splitting: An original high-dimensional vector (for example, 256 dimensions) is split into
mlow-dimensional sub-vectors of equal length. For example, splitting a 256-dimensional vector withm=32results in 32 sub-vectors of 8 dimensions each.Codebook training: For each sub-vector space, the system learns a separate codebook. This codebook contains
2^code_sizecenters, also known as centroids. The K-means clustering algorithm typically performs this training.Quantization and encoding: After training, when a new vector is encoded, the system replaces each of its sub-vectors with the ID of the nearest centroid in the corresponding sub-vector space's codebook. If
code_sizeis 8, the ID ranges from 0 to 255, which can be stored in a single byte.Final result: This process transforms an original vector into a sequence of centroid IDs, achieving very high compression.
Training requirements: The performance of PQ heavily depends on the quality of the training data. You must provide a set of vectors with a distribution similar to your search data.
Training data source: You can use a subset of the vectors you plan to index.
Recommended training data size:
When used with HNSW: The recommended number of training vectors is
2^code_size * 1000.When used with IVF: The recommended number of training vectors is
max(1000 * nlist, 2^code_size * 1000).
Memory estimation: The memory calculation for HNSW with PQ is complex, as it includes the cost of the compressed vectors, the HNSW graph structure, and the PQ codebook.
Formula:
Memory (bytes) ≈ 1.1 * ( (per_vector_cost) * num_vectors + (codebook_cost) )per_vector_cost = (pq_code_size / 8 * pq_m) + 24 + (8 * hnsw_m)codebook_cost = num_segments * (2^pq_code_size) * 4 * dimension
Parameters:
num_vectors: The total number of vectors.dimension: The dimension of the original vector.pq_m: The number of segments the vector is split into. Thedimensionmust be divisible bypq_m.pq_code_size: The size of the codebook for each sub-vector, in bits. This is typically 8.hnsw_m: Themparameter in the HNSW index, which is the maximum number of neighbors for each node.num_segments: An underlying technical parameter that represents the number of segments the index is divided into. For estimation, you can use the shard count of your cluster or a conservative value like 100.1.1: An overhead factor of approximately 10% for system usage.24and8: The fixed and pointer overheads, respectively, for each node in the HNSW graph structure.4: The size in bytes of a 32-bit floating-point number, used to store the centroid coordinates in the codebook.
Example: Assume you have 1 million vectors (
num_vectors), each with adimensionof 256. The vector is split into 32 segments (pq_m), and each sub-vector codebook size (pq_code_size) is 8. The HNSWmparameter is 16, andnum_segmentsis 100.Calculate the cost per vector (
per_vector_cost):Compressed vector size = pq_code_size / 8 * pq_m = 8 / 8 * 32 = 32 bytes.
HNSW graph overhead = 24 + 8 * hnsw_m = 24 + 8 * 16 = 152 bytes.
per_vector_cost = 32 + 152 = 184 bytes
Calculate the total codebook cost (
codebook_cost):codebook_cost = num_segments * (2^pq_code_size) * 4 * dimension
codebook_cost = 100 * (2^8) * 4 * 256 = 100 * 256 * 4 * 256 = 26,214,400 bytes
Calculate the total memory:
Total memory ≈ 1.1 * (per_vector_cost * num_vectors + codebook_cost)
Total memory ≈ 1.1 * (184 * 1,000,000 + 26,214,400) ≈ 231,235,840 bytes ≈ 0.215 GB
Usage example:
// HNSW with Product Quantization (PQ) example PUT /<my-hnswpq-index> { "settings" : { "index": { "knn": true } }, "mappings": { "properties": { "<my_vector_field>": { "type": "knn_vector", "dimension": 128, // The dimension must be divisible by m "method": { "name": "hnsw", "engine": "faiss", "parameters": { "m": 16, // m parameter for HNSW "ef_construction": 512, "encoder": { "name": "pq", "parameters": { "m": 4, // m parameter for PQ: splits the 128-dim vector into 4 segments of 32-dim each "code_size": 8 } } } } } } } }
Disk storage
How it works: Disk-based vector search uses internal quantization techniques to compress vectors and stores the main graph structure on disk instead of in heap memory. This optimization significantly reduces memory consumption but slightly increases query latency while maintaining high recall.
Memory estimation: There is no fixed formula. The actual physical memory usage is dynamically managed by the operating system based on access patterns.
Usage example:
// Disk-based storage example PUT /<my-ondisk-index> { "settings" : { "index": { "knn": true } }, "mappings": { "properties": { "<my_vector_field>": { "type": "knn_vector", "dimension": 128, "mode": "on_disk" // Enable the disk-based mode } } } }
Appendix: Complete code example
This complete code example for the OpenSearch Java client demonstrates how to create a vector index and perform a vector search.
How to run
mvn clean package
mvn exec:java -Dexec.mainClass="com.example.VectorSearchDemo"