The vector search feature of PolarSearch lets you use a REST API to perform efficient similarity searches on unstructured data, such as text and images. This feature can quickly and accurately find the most similar results in massive datasets, which effectively improves the intelligence of your applications.
Features
Vector search, also known as similarity search, is a technique that finds the most similar data by comparing the distance between vectors. It is fundamentally different from traditional search, which relies on exact keyword matches.
The core idea is to convert unstructured data, such as text, images, and audio, into numerical representations called vector embeddings using deep learning models, such as Large Language Models (LLMs). These multi-dimensional vectors capture the deep semantic information of the data.
When you initiate a query, PolarSearch converts your query content into a vector and then performs a k-Nearest Neighbor (k-NN) search. This core algorithm finds the k vectors in the dataset that are closest in distance to your query vector. The value of k is user-defined. For example, if k=5, PolarSearch finds the five most similar results. PolarSearch then returns these k most similar results.
To achieve efficient retrieval, PolarSearch relies on two core components: vector indexes and vector storage optimization.
Vector index: To avoid full-scale computation on massive datasets, you need to build a vector index in advance. An index builds a data structure that is optimized for queries based on the features of the vector data. During a query, the index can significantly narrow the search scope, which greatly improves retrieval performance. PolarSearch supports multiple types of vector indexes. The following section introduces the industry-mainstream HNSW and IVF indexes:
Hierarchical Navigable Small World (HNSW): A graph-based index that provides high performance and a high recall rate, but has a correspondingly high memory overhead. It is suitable for scenarios that require extremely low query latency and high accuracy, and where the dataset size is within memory capacity.
Inverted File (IVF): A clustering-based inverted index that has low memory usage. It is more suitable for scenarios that need to handle ultra-large datasets with limited memory, but its search accuracy is typically slightly lower than that of HNSW.
Vector storage optimization: Vector data, especially high-dimensional vectors, consumes a large amount of memory and storage space. PolarSearch provides multiple optimization techniques to reduce resource consumption.
Vector quantization: This technique reduces data precision to compress data and significantly reduce space usage. It strikes a balance between compression ratio and accuracy. PolarSearch supports product quantization (PQ), scalar quantization (SQ), and binary quantization (BQ).
Disk-based storage: For low-memory environments, this allows some index data to be stored on disk. This runs the vector search service at a lower memory cost, but with a slight increase in query latency.
Notes
Note the following when you use the PolarSearch vector search feature:
Index training requirements: The
IVFindex and thePQ(product quantization) technique require a separate training step before use. You must provide a representative sample of vector data to train the model. Otherwise, the index cannot work correctly.Memory overhead: Although the
HNSWindex offers excellent performance, its graph structure must be fully loaded into memory, which results in high memory overhead. You must evaluate your cluster's memory resources before you choose this option.Performance and cost trade-off: Disk-based vector search slightly increases query latency. You must evaluate this trade-off based on your business scenario.
Automatic training: The training process for binary quantization (BQ) is handled automatically during index building. You do not need to perform any extra training operations.
User guide
Preparations
To use the REST API for vector search, you must first enable the intelligent search (PolarSearch) feature. For more information about how to enable the PolarSearch feature for a new or existing cluster, see PolarSearch User Guide.
Step 1: Create a vector index
To store and search vectors, you must first create an index with a specific configuration. This involves two key actions:
Enable k-NN and define a vector field: In the index
settings, set theknnparameter totrue. This is the main switch that tells PolarDB that the index will be used for vector search.Core parameters
engine: This parameter must be set to `faiss`.NoteFaiss (Facebook AI Similarity Search) is a high-performance open-source library developed by Meta AI. It is designed for efficient similarity search and clustering of massive vector data. PolarSearch uses Faiss as its core vector search engine.
dimension: Specifies the dimension of the vector. This value must exactly match the vector dimension produced by your model.data_type: Defines the data type of the vector. The default value isfloat. You can also choosebyteorbinaryto optimize storage.space_type: Defines the method for calculating vector similarity, which is also known as the distance measure. The supported options are as follows:space_typeDistance measure
Description
l2L2 (Euclidean distance)
Calculates the square root of the sum of squared differences. It is sensitive to the magnitude of values.
l1L1 (Manhattan distance)
Calculates the sum of the absolute values of the differences between vector dimensions.
cosinesimilCosine similarity
Measures the angle between vectors, focusing on direction rather than magnitude.
innerproductInner product
Calculates the dot product of vectors. It is often used for sorting.
hammingHamming distance
Calculates the number of differing elements in binary vectors.
chebyshevL∞ (Chebyshev distance)
Considers only the maximum absolute value of the differences between vector dimensions.
Define a vector field (HNSW or IVF): In the index
mappings, define a field of theknn_vectortype. This field is specifically for storing vector data. In this field, you can configure the vector's dimension, similarity calculation method, and the core index method.Selection guide
HNSW and IVF have different strengths in performance, resource consumption, and accuracy, which makes them suitable for different business scenarios. You can refer to the following table for a quick selection:
Comparison
HNSW
IVF
Query latency
Extremely low. HNSW quickly locates results through a hierarchical graph structure with short search paths.
Low. IVF needs to first locate a cluster and then search within it, which results in a relatively longer path.
Recall rate (accuracy)
High. The graph has better connectivity, which makes it less likely to miss nearest neighbors.
Medium to high. The edge effect, where a query point is on the border of a cluster, may cause some loss of accuracy. This can be mitigated by adjusting the
nprobesparameter.Memory usage
High. The complete graph structure must be loaded into memory.
Low. IVF mainly stores centroids and posting lists. The memory overhead is much lower than that of HNSW.
Build time
Longer. Building a high-quality graph structure requires complex calculations.
Faster. However, IVF requires an extra training step to generate centroids.
Scenarios
Scenarios that have extreme requirements for query performance and accuracy, and have sufficient memory resources. Examples include real-time semantic search and facial recognition.
Cost-sensitive scenarios that have massive datasets, limited memory resources, and tolerance for a minor loss in accuracy. Examples include large-scale product recommendation and massive image gallery retrieval.
Usage examples
HNSW
HNSW is implemented through IndexHNSWFlat and is suitable for scenarios that have high requirements for performance and recall rate.
Core parameters
Parameter | Value range | Description |
| Positive integer. | The maximum number of neighbors (outdegree) for each node in the graph. This value determines the density of the graph and is the most critical parameter affecting index quality and memory usage.
|
| A positive integer, which should typically be greater than | The size of the dynamic neighbor list during index construction. It controls the search depth and breadth during graph construction. This value primarily affects the index build time and final quality.
|
| Positive integer. | The size of the dynamic neighbor list during a query. It controls the search depth during a query. Note This parameter is not specified when creating the index but is set globally in the index
|
When you create an HNSW index, replace <my-index> with the name of your index and <my_vector_field> with the name of your field. You must also configure other core parameters, such as dimension, data_type, space_type, m, and ef_construction as needed.
REST API
// HNSW index creation example. Replace <my-index> with the name of your index.
PUT /<my-index>
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"<my_vector_field>": {// Replace <my_vector_field> with the name of your field.
"type": "knn_vector",
"dimension": 128,
"data_type": "float",
"method": {
"name": "hnsw",
"engine": "faiss",
"space_type": "l2",
"parameters": {
"m": 16,
"ef_construction": 256
}
}
}
}
}
}Java client
private static void createVectorIndex(OpenSearchClient client) throws IOException {
Property vectorProperty = Property.of(p -> p.knnVector(
KnnVectorProperty.of(kvp -> kvp
.dimension(128)
.dataType("float")
.method(new KnnVectorMethod.Builder()
.name("hnsw")
.engine("faiss")
.spaceType("l2")
.parameters(Map.of(
"m", JsonData.of(16),
"ef_construction", JsonData.of(256)
))
.build()
)
)
));
TypeMapping mapping = TypeMapping.of(m -> m
.properties("<my_vector_field>", vectorProperty)
.properties("text", Property.of(p -> p.text(TextProperty.of(t -> t))))
.properties("category", Property.of(p -> p.keyword(k -> k)))
);
CreateIndexRequest request = new CreateIndexRequest.Builder()
.index(<my-index>)
.settings(s -> s.knn(true))
.mappings(mapping)
.build();
client.indices().create(request);
}IVF
IVF is implemented through IndexIVFFlat and is suitable for scenarios that have ultra-large datasets and limited memory.
Core parameters
Parameter | Value range | Description |
| Positive integer. | The number of centroids. The index divides the entire vector space into
|
| A positive integer, which should typically be smaller than | The number of centroids (clusters) to search during a query. This is the most direct parameter for trading off between query speed and recall rate.
|
When you create an IVF index, replace <my-index> with the name of your index and <my_vector_field> with the name of your field. You must also configure other core parameters, such as dimension, data_type, space_type, nlist, and nprobes as needed.
// IVF index creation example. Replace <my-index> with the name of your index.
PUT /<my-index>
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"<my_vector_field>": {// Replace <my_vector_field> with the name of your field.
"type": "knn_vector",
"dimension": 4,
"data_type": "byte",
"method": {
"name": "ivf",
"engine": "faiss",
"space_type": "l2",
"parameters": {
"nlist": 1024,
"nprobes": 10 // nprobes is usually specified at query time. This is just an example.
}
}
}
}
}
}Step 2: Index vector data
Prepare your documents, including vector data and other metadata, and then index them into the index that you just created.
REST API
POST /_bulk
{ "index": { "_index": "my-index", "_id": "doc_1" } }
{ "my_vector_field": [5.2, 4.4] }
{ "index": { "_index": "my-index", "_id": "doc_2" } }
{ "my_vector_field": [5.2, 3.9] }
{ "index": { "_index": "my-index", "_id": "doc_3" } }
{ "my_vector_field": [4.9, 3.4] }
{ "index": { "_index": "my-index", "_id": "doc_4" } }
{ "my_vector_field": [4.2, 4.6] }
{ "index": { "_index": "my-index", "_id": "doc_5" } }
{ "my_vector_field": [3.3, 4.5] }Java client
private static void indexSampleData(OpenSearchClient client) throws IOException {
List<Map<String, Object>> documents = new ArrayList<>();
documents.add(Map.of("text", "a book about data science", "category", "books", "<my_vector_field>", List.of(1.0f, 2.0f, 3.0f, 4.0f)));
documents.add(Map.of("text", "an intelligent smartphone with a great camera", "category", "electronics", "<my_vector_field>", List.of(8.0f, 7.0f, 6.0f, 5.0f)));
documents.add(Map.of("text", "a technical manual for a smart device", "category", "electronics", "<my_vector_field>", List.of(3.0f, 4.0f, 5.0f, 6.0f)));
for (int i = 0; i < documents.size(); i++) {
IndexRequest<Map<String, Object>> request = new IndexRequest.Builder<Map<String, Object>>()
.index(<my-index>)
.id("doc_" + i)
.document(documents.get(i))
.build();
client.index(request);
}
}Step 3: Perform a vector search
Now, you can send a vector search request to find the results that are most similar to your query vector from the massive dataset.
Basic k-NN search
This is the most basic vector search. It finds the k results in the entire index that are closest to the query vector.
REST API
POST /<my-index>/_search
{
"size": 3,
"query": {
"knn": {
"<my_vector_field>": {
"vector": [3.1, 4.1, 5.1, 6.1],
"k": 3
}
}
}
}Java client
// Prepare your query vector.
List<Float> queryVector = List.of(3.1f, 4.1f, 5.1f, 6.1f);
private static void performBasicKnnSearch(OpenSearchClient client, List<Float> queryVector) throws IOException {
System.out.println("\n--- 1. Performing Basic k-NN Search ---");
System.out.println("Querying for vectors most similar to: " + queryVector);
// Find the 3 most similar results.
int k = 3;
KnnQuery knnQuery = new KnnQuery.Builder()
.field("<my_vector_field>")
.vector(queryVector)
.k(k)
.build();
SearchRequest searchRequest = new SearchRequest.Builder().index(<my-index>).query(new Query.Builder().knn(knnQuery).build()).size(k).build();
SearchResponse<Map> response = client.search(searchRequest, Map.class);
printResults(response);
}k-NN search with a filter condition (hybrid search)
In many scenarios, you may need to narrow the search scope with one or more conditions before you perform a vector search. This is the core idea of hybrid search. You can use the filter clause in a KnnQuery to do this. The filter itself can be any standard OpenSearch query, such as term for an exact value match or match for a full-text index.
Filter using a text match
This is suitable for classic "keyword + vector" hybrid search scenarios. For example, you can first search for all documents whose descriptions contain "new smartphone", and then sort them by vector similarity.
REST API
POST /<my-index>/_search
{
"size": 3,
"query": {
"knn": {
"<my_vector_field>": {
"vector": [3.1, 4.1, 5.1, 6.1],
"k": 3,
"filter": {
"match": {
"text": "book"
}
}
}
}
}
}Java client
// Prepare your query vector.
List<Float> queryVector = List.of(3.1f, 4.1f, 5.1f, 6.1f);
private static void performHybridSearchWithText(OpenSearchClient client, List<Float> queryVector) throws IOException {
System.out.println("\n--- 2. Performing Hybrid Search (k-NN + Text Match) ---");
// Prepare your query keyword.
String textQuery = "book";
System.out.println("Filtering for documents containing '" + textQuery + "', then finding most similar vectors.");
// Find the 3 most similar results.
int k = 3;
MatchQuery matchQuery = new MatchQuery.Builder().field("text").query(q -> q.stringValue(textQuery)).build();
KnnQuery hybridKnnQuery = new KnnQuery.Builder()
.field("<my_vector_field>")
.vector(queryVector)
.k(k)
.filter(new Query.Builder().match(matchQuery).build())
.build();
SearchRequest searchRequest = new SearchRequest.Builder().index(<my-index>).query(new Query.Builder().knn(hybridKnnQuery).build()).size(k).build();
SearchResponse<Map> response = client.search(searchRequest, Map.class);
printResults(response);
}Filter using an exact value (Term Filter)
This is suitable for scenarios where you filter by a specific label, category, or ID. For example, you can search for the most similar products only within the "electronics" category.
REST API
POST /<my-index>/_search
{
"size": 3,
"query": {
"knn": {
"<my_vector_field>": {
"vector": [5, 4],
"k": 3,
"filter": {
"term": {
"category": "electronics"
}
}
}
}
}
}Java client
// Prepare your query vector.
List<Float> queryVector = List.of(3.1f, 4.1f, 5.1f, 6.1f);
private static void performFilteredSearchWithTerm(OpenSearchClient client, List<Float> queryVector) throws IOException {
System.out.println("\n--- 3. Performing Filtered Search (k-NN + Term Filter) ---");
// Prepare your query category.
String categoryFilter = "electronics";
System.out.println("Filtering for documents in category '" + categoryFilter + "', then finding most similar vectors.");
// Find the 3 most similar results.
int k = 3;
TermQuery termQuery = new TermQuery.Builder().field("category").value(v -> v.stringValue(categoryFilter)).build();
KnnQuery filteredKnnQuery = new KnnQuery.Builder()
.field("<my_vector_field>")
.vector(queryVector)
.k(k)
.filter(new Query.Builder().term(termQuery).build())
.build();
SearchRequest searchRequest = new SearchRequest.Builder().index(<my-index>).query(new Query.Builder().knn(filteredKnnQuery).build()).size(k).build();
SearchResponse<Map> response = client.search(searchRequest, Map.class);
printResults(response);
}Configure storage optimization
Vector data, especially high-dimension floating-point vectors, consumes a large amount of memory. PolarSearch provides multiple storage optimization techniques. These techniques compress vectors through quantization or change the storage medium to balance memory cost, query performance, and search accuracy.
Recommendations
You can refer to the following table to quickly find the best optimization policy for your business scenario.
Optimization policy | Compression ratio | Impact on accuracy | Training requirements | CPU overhead | Scenarios |
Scalar quantization (SQ) | Low (fixed at 2×) | Minimal | No training required | Low | Use in scenarios that require extremely high search accuracy and moderate memory optimization with minimal accuracy loss. |
Binary quantization (BQ) | High (8× to 32×) | Significant | No training required | Medium | Use in memory-sensitive scenarios where you can tolerate some accuracy loss to achieve maximum memory savings. |
Product quantization (PQ) | Highest | Medium | Training required | Medium | Use for massive datasets that require an extreme compression ratio. This is for scenarios where you are willing to spend time on model training to balance accuracy and memory. |
Disk-based vector storage | - | Significant | No training required | High | Use in cost-sensitive scenarios with extremely limited memory. In these cases, you prioritize minimizing memory usage over query latency, which is affected by disk I/O. |
Instructions
Scalar quantization (SQ)
How it works: This method converts standard 32-bit floating-point (float) vectors into 16-bit floating-point (fp16) vectors for storage, which halves the memory usage. During distance calculation, the vectors are decoded back to 32-bit, so the impact on accuracy is minimal.
Memory estimation:
Formula:
Memory (GB) ≈ 1.1 × (2 × dimension + 8 × m) × num_vectors / 1024³Parameter description:
dimension: The dimension of the vector.m: Themparameter in the Hierarchical Navigable Small World (HNSW) index. It specifies the maximum number of neighbors for each node.num_vectors: The total number of vectors.1.1: A coefficient for system overhead, which is about 10%.
Example: Assume that you have 1 million vectors, the dimension of each vector is 256, and the
mparameter is 16. The memory requirement is estimated as follows:1.1 × (2 × 256 + 8 × 16) × 1,000,000 ≈ 0.656 GB
Example:
// HNSW + Scalar Quantization (SQ) example PUT /<my-sq-index> { "settings": { "index": { "knn": true } }, "mappings": { "properties": { "<my_vector_field>": { "type": "knn_vector", "dimension": 128, "method": { "name": "hnsw", "engine": "faiss", "parameters": { "m": 16, "ef_construction": 256, "encoder": {// Enable SQ "name": "fp16" } } } } } } }
Binary quantization (BQ)
How it works: This method compresses each dimension of a floating-point vector into binary bits (0 and 1) for storage, which achieves a very high compression ratio. The training process completes automatically when the index is built.
Memory estimation:
Formula:
Memory (GB) ≈ 1.1 × ((dimension × bits / 8) + 8 × m) × num_vectors / 1024³Parameter description:
dimension: The dimension of the vector.bits: The number of binary bits used to represent each dimension. Valid values are 1, 2, and 4. A smallerbitsvalue results in a higher compression ratio but a greater loss in accuracy.m: Themparameter in the HNSW index.num_vectors: The total number of vectors.
Example: Assume that you have 1 million vectors, the dimension of each vector is 256, and the
mparameter is 16. The following sections provide memory requirement estimates for different compression values.1-bit quantization (32× compression): In 1-bit quantization, each dimension is represented by 1 bit, which is equivalent to a 32× compression ratio. The memory requirement is estimated as follows:
1.1 × ((256 × 1 / 8) + 8 × 16) × 1,000,000 ≈ 0.176 GB2-bit quantization (16× compression): In 2-bit quantization, each dimension is represented by 2 bits, which is equivalent to a 16× compression ratio. The memory requirement is estimated as follows:
1.1 × ((256 × 2 / 8) + 8 × 16) × 1,000,000 ≈ 0.211 GB
Example:
// HNSW + Binary Quantization (BQ) example PUT /<my-bq-index> { "settings" : { "index": { "knn": true } }, "mappings": { "properties": { "<my_vector_field>": { "type": "knn_vector", "dimension": 128, "method": { "name": "hnsw", "engine": "faiss", "parameters": { "m": 16, "ef_construction": 512, "encoder": { "name": "binary", "parameters": {// Enable BQ, use 1-bit quantization "bits": 1 } } } } } } } }
Product quantization (PQ)
Product quantization (PQ) is an advanced vector compression technique. It achieves higher compression ratios than SQ or BQ, but requires a separate training step to build the compression model.
How it works:
Vector chunking: First, an original high-dimension vector, such as a 256-dimension vector, is chunked into
mlow-dimension sub-vectors of equal length. For example, chunking a 256-dimension vector withm=32results in 32 8-dimension sub-vectors.Codebook training: Next, the system learns a separate codebook for each sub-vector space. This codebook contains
2^code_sizecentroids. This training process is usually performed using the k-means clustering algorithm.Quantization encoding: After training, when a new vector is encoded, each of its sub-vectors is replaced. Instead of storing the original floating-point value, the system stores the ID of the nearest centroid in the sub-vector's codebook. If
code_sizeis 8, the ID range is 0 to 255, which can be stored in exactly 1 byte.Final result: An original vector is transformed into a sequence of centroid IDs. This method achieves very high compression.
Training requirements: The performance of PQ heavily depends on the quality of the training data. You must provide a set of vectors for training that has a data distribution similar to the data that you will eventually retrieve.
Training data source: The training data can be a subset of the vector data that you plan to index.
Recommended amount of training data:
When used with HNSW: The recommended number of training vectors is
2^code_size × 1,000.When used with Inverted File (IVF): The recommended number of training vectors is
max(1,000 × nlist, 2^code_size × 1,000).
Memory estimation: Take HNSW+PQ as an example. When HNSW is combined with PQ, the memory calculation formula is complex because it includes the overhead of the compressed vectors, the HNSW graph structure, and the PQ codebook.
Formula:
Memory (bytes) ≈ 1.1 × ( (per_vector_cost) × num_vectors + (codebook_cost) )per_vector_cost = (pq_code_size / 8 × pq_m) + 24 + (8 × hnsw_m)codebook_cost = num_segments × (2^pq_code_size) × 4 × dimension
Parameter description:
num_vectors: The total number of vectors.dimension: The dimension of the original vector.pq_m: The number of segments into which the vector is chunked.dimensionmust be divisible bypq_m.pq_code_size: The size of each sub-vector codebook, in bits. A typical value is 8.hnsw_m: Themparameter in the HNSW index, which is the maximum number of neighbors for each node.num_segments: A low-level technical parameter that represents the number of segments into which the index is divided. For estimation, you can use the number of cluster shards or a conservative value, such as 100.1.1: A coefficient for system overhead, which is about 10%.24and8: The fixed overhead and pointer overhead for each node in the HNSW graph structure.4: Represents that the centroid coordinates in the codebook are stored as 32-bit floating-point numbers, which are 4 bytes.
Example: Assume that you have 1 million vectors (
num_vectors). The dimension of each vector (dimension) is 256. The number of chunks for each vector (pq_m) is 32. The size of each sub-vector codebook (pq_code_size) is 8. Themparameter for the HNSW index is 16, andnum_segmentsis 100.Calculate the overhead for a single vector (
per_vector_cost):Compressed vector size = pq_code_size / 8 × pq_m = 8 / 8 × 32 = 32 bytes.
HNSW graph overhead = 24 + 8 × hnsw_m = 24 + 8 × 16 = 152 bytes.
per_vector_cost = 32 + 152 = 184 bytes
Calculate the total codebook overhead (
codebook_cost):codebook_cost = num_segments × (2^pq_code_size) × 4 × dimension.
codebook_cost = 100 × (2^8) × 4 × 256 = 100 × 256 × 4 × 256 = 26,214,400 bytes.
Calculate the total memory:
Total memory ≈ 1.1 × (per_vector_cost × num_vectors + codebook_cost)
Total memory ≈ 1.1 × (184 × 1,000,000 + 26,214,400) ≈ 231,235,840 bytes ≈ 0.215 GB
Example:
// HNSW + Product Quantization (PQ) example PUT /<my-hnswpq-index> { "settings" : { "index": { "knn": true } }, "mappings": { "properties": { "<my_vector_field>": { "type": "knn_vector", "dimension": 128, // Dimension must be divisible by m "method": { "name": "hnsw", "engine": "faiss", "parameters": { "m": 16, // m parameter for HNSW "ef_construction": 512, "encoder": { "name": "pq", "parameters": { "m": 4, // m parameter for PQ: chunk the 128-dimension vector into 4 segments of 32 dimensions "code_size": 8 } } } } } } } }
Disk-based vector storage
How it works: Disk-based vector search uses internal quantization techniques to compress vectors and stores the main graph structure on disk instead of in heap memory. This memory optimization significantly saves memory, but search latency increases slightly. A high recall rate is still maintained.
Memory estimation: There is no fixed formula. The actual physical memory usage is dynamically managed by the operating system based on the access mode.
Example:
// Disk-based storage example PUT /<my-ondisk-index> { "settings" : { "index": { "knn": true } }, "mappings": { "properties": { "<my_vector_field>": { "type": "knn_vector", "dimension": 128, "mode": "on_disk" // Enable disk-based mode } } } }
Appendix: Complete sample code
This section provides complete sample code for the Java client. The code demonstrates the entire procedure, from creating a vector index to performing a vector search.
How it works
mvn clean package
mvn exec:java -Dexec.mainClass="com.example.VectorSearchDemo"