Perform vector search using the OpenSearch protocol - PolarDB

The vector search feature of PolarSearch lets you use a REST API to perform efficient similarity searches on unstructured data, such as text and images. This feature can quickly and accurately find the most similar results in massive datasets, which effectively improves the intelligence of your applications.

Features

Vector search, also known as similarity search, is a technique that finds the most similar data by comparing the distance between vectors. It is fundamentally different from traditional search, which relies on exact keyword matches.

The core idea is to convert unstructured data, such as text, images, and audio, into numerical representations called vector embeddings using deep learning models, such as Large Language Models (LLMs). These multi-dimensional vectors capture the deep semantic information of the data.

When you initiate a query, PolarSearch converts your query content into a vector and then performs a k-Nearest Neighbor (k-NN) search. This core algorithm finds the k vectors in the dataset that are closest in distance to your query vector. The value of k is user-defined. For example, if k=5, PolarSearch finds the five most similar results. PolarSearch then returns these k most similar results.

To achieve efficient retrieval, PolarSearch relies on two core components: vector indexes and vector storage optimization.

Vector index: To avoid full-scale computation on massive datasets, you need to build a vector index in advance. An index builds a data structure that is optimized for queries based on the features of the vector data. During a query, the index can significantly narrow the search scope, which greatly improves retrieval performance. PolarSearch supports multiple types of vector indexes. The following section introduces the industry-mainstream HNSW and IVF indexes:
- Hierarchical Navigable Small World (HNSW): A graph-based index that provides high performance and a high recall rate, but has a correspondingly high memory overhead. It is suitable for scenarios that require extremely low query latency and high accuracy, and where the dataset size is within memory capacity.
- Inverted File (IVF): A clustering-based inverted index that has low memory usage. It is more suitable for scenarios that need to handle ultra-large datasets with limited memory, but its search accuracy is typically slightly lower than that of HNSW.
Vector storage optimization: Vector data, especially high-dimensional vectors, consumes a large amount of memory and storage space. PolarSearch provides multiple optimization techniques to reduce resource consumption.
- Vector quantization: This technique reduces data precision to compress data and significantly reduce space usage. It strikes a balance between compression ratio and accuracy. PolarSearch supports product quantization (PQ), scalar quantization (SQ), and binary quantization (BQ).
- Disk-based storage: For low-memory environments, this allows some index data to be stored on disk. This runs the vector search service at a lower memory cost, but with a slight increase in query latency.

Notes

Note the following when you use the PolarSearch vector search feature:

Index training requirements: The IVF index and the PQ (product quantization) technique require a separate training step before use. You must provide a representative sample of vector data to train the model. Otherwise, the index cannot work correctly.
Memory overhead: Although the HNSW index offers excellent performance, its graph structure must be fully loaded into memory, which results in high memory overhead. You must evaluate your cluster's memory resources before you choose this option.
Performance and cost trade-off: Disk-based vector search slightly increases query latency. You must evaluate this trade-off based on your business scenario.
Automatic training: The training process for binary quantization (BQ) is handled automatically during index building. You do not need to perform any extra training operations.

User guide

Preparations

To use the REST API for vector search, you must first enable the intelligent search (PolarSearch) feature. For more information about how to enable the PolarSearch feature for a new or existing cluster, see PolarSearch User Guide.

Step 1: Create a vector index

To store and search vectors, you must first create an index with a specific configuration. This involves two key actions:

Enable k-NN and define a vector field: In the index settings, set the knn parameter to true. This is the main switch that tells PolarDB that the index will be used for vector search.

Core parameters

engine: This parameter must be set to `faiss`.
Note
Faiss (Facebook AI Similarity Search) is a high-performance open-source library developed by Meta AI. It is designed for efficient similarity search and clustering of massive vector data. PolarSearch uses Faiss as its core vector search engine.
dimension: Specifies the dimension of the vector. This value must exactly match the vector dimension produced by your model.
data_type: Defines the data type of the vector. The default value is float. You can also choose byte or binary to optimize storage.

space_type: Defines the method for calculating vector similarity, which is also known as the distance measure. The supported options are as follows:

`space_type`	Distance measure	Description
`l2`	L2 (Euclidean distance)	Calculates the square root of the sum of squared differences. It is sensitive to the magnitude of values.
`l1`	L1 (Manhattan distance)	Calculates the sum of the absolute values of the differences between vector dimensions.
`cosinesimil`	Cosine similarity	Measures the angle between vectors, focusing on direction rather than magnitude.
`innerproduct`	Inner product	Calculates the dot product of vectors. It is often used for sorting.
`hamming`	Hamming distance	Calculates the number of differing elements in binary vectors.
`chebyshev`	L∞ (Chebyshev distance)	Considers only the maximum absolute value of the differences between vector dimensions.

Define a vector field (HNSW or IVF): In the index mappings, define a field of the knn_vector type. This field is specifically for storing vector data. In this field, you can configure the vector's dimension, similarity calculation method, and the core index method.

Selection guide

HNSW and IVF have different strengths in performance, resource consumption, and accuracy, which makes them suitable for different business scenarios. You can refer to the following table for a quick selection:

Comparison	HNSW	IVF
Query latency	Extremely low. HNSW quickly locates results through a hierarchical graph structure with short search paths.	Low. IVF needs to first locate a cluster and then search within it, which results in a relatively longer path.
Recall rate (accuracy)	High. The graph has better connectivity, which makes it less likely to miss nearest neighbors.	Medium to high. The edge effect, where a query point is on the border of a cluster, may cause some loss of accuracy. This can be mitigated by adjusting the `nprobes` parameter.
Memory usage	High. The complete graph structure must be loaded into memory.	Low. IVF mainly stores centroids and posting lists. The memory overhead is much lower than that of HNSW.
Build time	Longer. Building a high-quality graph structure requires complex calculations.	Faster. However, IVF requires an extra training step to generate centroids.
Scenarios	Scenarios that have extreme requirements for query performance and accuracy, and have sufficient memory resources. Examples include real-time semantic search and facial recognition.	Cost-sensitive scenarios that have massive datasets, limited memory resources, and tolerance for a minor loss in accuracy. Examples include large-scale product recommendation and massive image gallery retrieval.

Usage examples

HNSW

HNSW is implemented through IndexHNSWFlat and is suitable for scenarios that have high requirements for performance and recall rate.

Core parameters

Parameter	Value range	Description
`m`	Positive integer.	The maximum number of neighbors (outdegree) for each node in the graph. This value determines the density of the graph and is the most critical parameter affecting index quality and memory usage. Larger value: The graph has better connectivity, search paths are more optimal, and the recall rate is higher. However, index building is slower, and memory usage is greater. Smaller value: Build speed is fast, and memory usage is low. However, it may cause the search to fall into a local optimum too early, affecting the recall rate. Practical advice: A value between 8 and 64 is generally recommended. You can start with 16 or 32 and adjust based on the test results for recall rate and memory usage.
`ef_construction`	A positive integer, which should typically be greater than `m`.	The size of the dynamic neighbor list during index construction. It controls the search depth and breadth during graph construction. This value primarily affects the index build time and final quality. Larger value: Explores more potential neighbors when inserting a new node, resulting in a higher-quality graph (which benefits the recall rate). However, the build time will increase significantly. Practical advice: It is generally recommended to set this to twice the value of `m` or higher. If build time is not a concern but you want a high-quality index, you can set it to 500 or higher.
`ef_search`	Positive integer.	The size of the dynamic neighbor list during a query. It controls the search depth during a query. Note This parameter is not specified when creating the index but is set globally in the index `settings` or during a query. It is a direct factor affecting query latency and recall rate. Larger value: The query explores more nodes, resulting in a higher recall rate, but also longer query times. Practical advice: There is no fixed recommendation for this value. You need to find the best balance between latency and recall rate through business stress testing. You can start with a small value (such as 50 or 100) and gradually increase it while observing performance changes.

Note

When you create an HNSW index, replace <my-index> with the name of your index and <my_vector_field> with the name of your field. You must also configure other core parameters, such as dimension, data_type, space_type, m, and ef_construction as needed.

REST API

// HNSW index creation example. Replace <my-index> with the name of your index.
PUT /<my-index>
{
  "settings": {
    "index": { 
      "knn": true 
    } 
  },
  "mappings": {
    "properties": {
      "<my_vector_field>": {// Replace <my_vector_field> with the name of your field.
        "type": "knn_vector",
        "dimension": 128,
        "data_type": "float",
        "method": {
          "name": "hnsw",
          "engine": "faiss",
          "space_type": "l2",
          "parameters": {
            "m": 16,
            "ef_construction": 256
          }
        }
      }
    }
  }
}

Java client

private static void createVectorIndex(OpenSearchClient client) throws IOException {
    Property vectorProperty = Property.of(p -> p.knnVector(
        KnnVectorProperty.of(kvp -> kvp
            .dimension(128)
            .dataType("float")
            .method(new KnnVectorMethod.Builder()
                .name("hnsw")
                .engine("faiss")
                .spaceType("l2")
                .parameters(Map.of(
                    "m", JsonData.of(16),
                    "ef_construction", JsonData.of(256)
                ))
                .build()
            )
        )
    ));
    
    TypeMapping mapping = TypeMapping.of(m -> m
        .properties("<my_vector_field>", vectorProperty)
        .properties("text", Property.of(p -> p.text(TextProperty.of(t -> t))))
        .properties("category", Property.of(p -> p.keyword(k -> k)))
    );
    
    CreateIndexRequest request = new CreateIndexRequest.Builder()
        .index(<my-index>)
        .settings(s -> s.knn(true))
        .mappings(mapping)
        .build();
        
    client.indices().create(request);
}

IVF

IVF is implemented through IndexIVFFlat and is suitable for scenarios that have ultra-large datasets and limited memory.

Core parameters

Parameter

Value range

Description

nlist

Positive integer.

The number of centroids. The index divides the entire vector space into nlist regions (clusters). This value is fundamental to IVF performance.

Larger value: The regions are more finely divided, and each cluster contains fewer vectors. This makes queries faster because less data needs to be scanned. However, it may increase the edge effect, leading to a lower recall rate, and will also increase memory usage.
Smaller value: Each cluster contains more vectors, which slows down the search. However, the recall rate may be higher.
Practical advice: A common rule of thumb is to set nlist between 4 × sqrt(N) and 16 × sqrt(N), where N is the total number of vectors. For example, for 1 million vectors, sqrt(N) = 1000, so nlist can be set between 4,000 and 16,000. Starting with 1024 or 4096 is often a good starting point.

nprobes

A positive integer, which should typically be smaller than nlist.

The number of centroids (clusters) to search during a query. This is the most direct parameter for trading off between query speed and recall rate.

Larger value: The query accesses more clusters, widening the search scope. This can effectively mitigate the edge effect and lead to a higher recall rate. However, the query speed will decrease linearly.
Smaller value: Query speed is fast. However, if the query vector happens to fall on the boundary of multiple clusters, it is likely that the nearest neighbor will not be found because the search scope is not large enough, resulting in a low recall rate.
Practical advice: Usually, start with a small value, such as 10 or 20, and then gradually increase it based on recall rate requirements until an acceptable performance balance is found.

Note

When you create an IVF index, replace <my-index> with the name of your index and <my_vector_field> with the name of your field. You must also configure other core parameters, such as dimension, data_type, space_type, nlist, and nprobes as needed.

// IVF index creation example. Replace <my-index> with the name of your index.
PUT /<my-index>
{
  "settings": {
    "index": { 
      "knn": true 
    } 
  },
  "mappings": {
    "properties": {
      "<my_vector_field>": {// Replace <my_vector_field> with the name of your field.
        "type": "knn_vector",
        "dimension": 4,
        "data_type": "byte",
        "method": {
          "name": "ivf",
          "engine": "faiss",
          "space_type": "l2",
          "parameters": {
            "nlist": 1024,
            "nprobes": 10 // nprobes is usually specified at query time. This is just an example.
          }
        }
      }
    }
  }
}

Step 2: Index vector data

Prepare your documents, including vector data and other metadata, and then index them into the index that you just created.

REST API

POST /_bulk
{ "index": { "_index": "my-index", "_id": "doc_1" } }
{ "my_vector_field": [5.2, 4.4] }
{ "index": { "_index": "my-index", "_id": "doc_2" } }
{ "my_vector_field": [5.2, 3.9] }
{ "index": { "_index": "my-index", "_id": "doc_3" } }
{ "my_vector_field": [4.9, 3.4] }
{ "index": { "_index": "my-index", "_id": "doc_4" } }
{ "my_vector_field": [4.2, 4.6] }
{ "index": { "_index": "my-index", "_id": "doc_5" } }
{ "my_vector_field": [3.3, 4.5] }

Java client

private static void indexSampleData(OpenSearchClient client) throws IOException {
    List<Map<String, Object>> documents = new ArrayList<>();
    documents.add(Map.of("text", "a book about data science", "category", "books", "<my_vector_field>", List.of(1.0f, 2.0f, 3.0f, 4.0f)));
    documents.add(Map.of("text", "an intelligent smartphone with a great camera", "category", "electronics", "<my_vector_field>", List.of(8.0f, 7.0f, 6.0f, 5.0f)));
    documents.add(Map.of("text", "a technical manual for a smart device", "category", "electronics", "<my_vector_field>", List.of(3.0f, 4.0f, 5.0f, 6.0f)));

    for (int i = 0; i < documents.size(); i++) {
        IndexRequest<Map<String, Object>> request = new IndexRequest.Builder<Map<String, Object>>()
            .index(<my-index>)
            .id("doc_" + i)
            .document(documents.get(i))
            .build();
        client.index(request);
    }
}

Step 3: Perform a vector search

Now, you can send a vector search request to find the results that are most similar to your query vector from the massive dataset.

Basic k-NN search

This is the most basic vector search. It finds the k results in the entire index that are closest to the query vector.

REST API

POST /<my-index>/_search
{
  "size": 3,
  "query": {
    "knn": {
      "<my_vector_field>": {
        "vector": [3.1, 4.1, 5.1, 6.1],
        "k": 3
      }
    }
  }
}

Java client

// Prepare your query vector.
List<Float> queryVector = List.of(3.1f, 4.1f, 5.1f, 6.1f);

private static void performBasicKnnSearch(OpenSearchClient client, List<Float> queryVector) throws IOException {
    System.out.println("\n--- 1. Performing Basic k-NN Search ---");
    System.out.println("Querying for vectors most similar to: " + queryVector);
    // Find the 3 most similar results.
    int k = 3;

    KnnQuery knnQuery = new KnnQuery.Builder()
        .field("<my_vector_field>")
        .vector(queryVector)
        .k(k)
        .build();

    SearchRequest searchRequest = new SearchRequest.Builder().index(<my-index>).query(new Query.Builder().knn(knnQuery).build()).size(k).build();
    SearchResponse<Map> response = client.search(searchRequest, Map.class);

    printResults(response);
}

k-NN search with a filter condition (hybrid search)

In many scenarios, you may need to narrow the search scope with one or more conditions before you perform a vector search. This is the core idea of hybrid search. You can use the filter clause in a KnnQuery to do this. The filter itself can be any standard OpenSearch query, such as term for an exact value match or match for a full-text index.

Filter using a text match

This is suitable for classic "keyword + vector" hybrid search scenarios. For example, you can first search for all documents whose descriptions contain "new smartphone", and then sort them by vector similarity.

REST API

POST /<my-index>/_search
{
  "size": 3,
  "query": {
    "knn": {
      "<my_vector_field>": {
        "vector": [3.1, 4.1, 5.1, 6.1],
        "k": 3,
        "filter": {
          "match": {
            "text": "book"
          }
        }
      }
    }
  }
}

Java client

// Prepare your query vector.
List<Float> queryVector = List.of(3.1f, 4.1f, 5.1f, 6.1f);

private static void performHybridSearchWithText(OpenSearchClient client, List<Float> queryVector) throws IOException {
    System.out.println("\n--- 2. Performing Hybrid Search (k-NN + Text Match) ---");
    // Prepare your query keyword.
    String textQuery = "book";
    System.out.println("Filtering for documents containing '" + textQuery + "', then finding most similar vectors.");
    // Find the 3 most similar results.
    int k = 3;

    MatchQuery matchQuery = new MatchQuery.Builder().field("text").query(q -> q.stringValue(textQuery)).build();

    KnnQuery hybridKnnQuery = new KnnQuery.Builder()
        .field("<my_vector_field>")
        .vector(queryVector)
        .k(k)
        .filter(new Query.Builder().match(matchQuery).build())
        .build();
    
    SearchRequest searchRequest = new SearchRequest.Builder().index(<my-index>).query(new Query.Builder().knn(hybridKnnQuery).build()).size(k).build();
    SearchResponse<Map> response = client.search(searchRequest, Map.class);
    
    printResults(response);
}

Filter using an exact value (Term Filter)

This is suitable for scenarios where you filter by a specific label, category, or ID. For example, you can search for the most similar products only within the "electronics" category.

REST API

POST /<my-index>/_search
{
  "size": 3,
  "query": {
    "knn": {
      "<my_vector_field>": {
        "vector": [5, 4],
        "k": 3,
        "filter": {
          "term": {
            "category": "electronics"
          }
        }
      }
    }
  }
}

Java client

// Prepare your query vector.
List<Float> queryVector = List.of(3.1f, 4.1f, 5.1f, 6.1f);

private static void performFilteredSearchWithTerm(OpenSearchClient client, List<Float> queryVector) throws IOException {
    System.out.println("\n--- 3. Performing Filtered Search (k-NN + Term Filter) ---");
    // Prepare your query category.
    String categoryFilter = "electronics";
    System.out.println("Filtering for documents in category '" + categoryFilter + "', then finding most similar vectors.");
    // Find the 3 most similar results.
    int k = 3;

    TermQuery termQuery = new TermQuery.Builder().field("category").value(v -> v.stringValue(categoryFilter)).build();
    
    KnnQuery filteredKnnQuery = new KnnQuery.Builder()
        .field("<my_vector_field>")
        .vector(queryVector)
        .k(k)
        .filter(new Query.Builder().term(termQuery).build())
        .build();
    
    SearchRequest searchRequest = new SearchRequest.Builder().index(<my-index>).query(new Query.Builder().knn(filteredKnnQuery).build()).size(k).build();
    SearchResponse<Map> response = client.search(searchRequest, Map.class);

    printResults(response);
}

Configure storage optimization

Vector data, especially high-dimension floating-point vectors, consumes a large amount of memory. PolarSearch provides multiple storage optimization techniques. These techniques compress vectors through quantization or change the storage medium to balance memory cost, query performance, and search accuracy.

Recommendations

You can refer to the following table to quickly find the best optimization policy for your business scenario.

Optimization policy	Compression ratio	Impact on accuracy	Training requirements	CPU overhead	Scenarios
Scalar quantization (SQ)	Low (fixed at 2×)	Minimal	No training required	Low	Use in scenarios that require extremely high search accuracy and moderate memory optimization with minimal accuracy loss.
Binary quantization (BQ)	High (8× to 32×)	Significant	No training required	Medium	Use in memory-sensitive scenarios where you can tolerate some accuracy loss to achieve maximum memory savings.
Product quantization (PQ)	Highest	Medium	Training required	Medium	Use for massive datasets that require an extreme compression ratio. This is for scenarios where you are willing to spend time on model training to balance accuracy and memory.
Disk-based vector storage	-	Significant	No training required	High	Use in cost-sensitive scenarios with extremely limited memory. In these cases, you prioritize minimizing memory usage over query latency, which is affected by disk I/O.

Instructions

Scalar quantization (SQ)

How it works: This method converts standard 32-bit floating-point (float) vectors into 16-bit floating-point (fp16) vectors for storage, which halves the memory usage. During distance calculation, the vectors are decoded back to 32-bit, so the impact on accuracy is minimal.
Memory estimation:
- Formula: Memory (GB) ≈ 1.1 × (2 × dimension + 8 × m) × num_vectors / 1024³
- Parameter description:
  - dimension: The dimension of the vector.
  - m: The m parameter in the Hierarchical Navigable Small World (HNSW) index. It specifies the maximum number of neighbors for each node.
  - num_vectors: The total number of vectors.
  - 1.1: A coefficient for system overhead, which is about 10%.
- Example: Assume that you have 1 million vectors, the dimension of each vector is 256, and the m parameter is 16. The memory requirement is estimated as follows: 1.1 × (2 × 256 + 8 × 16) × 1,000,000 ≈ 0.656 GB

Example:

// HNSW + Scalar Quantization (SQ) example
PUT /<my-sq-index>
{
  "settings": {
    "index": { 
      "knn": true 
    }
  },
  "mappings": {
    "properties": {
      "<my_vector_field>": {
        "type": "knn_vector",
        "dimension": 128,
        "method": {
          "name": "hnsw",
          "engine": "faiss",
          "parameters": {
            "m": 16,
            "ef_construction": 256,
            "encoder": {// Enable SQ
              "name": "fp16"
             } 
          }
        }
      }
    }
  }
}

Binary quantization (BQ)

How it works: This method compresses each dimension of a floating-point vector into binary bits (0 and 1) for storage, which achieves a very high compression ratio. The training process completes automatically when the index is built.
Memory estimation:
- Formula: Memory (GB) ≈ 1.1 × ((dimension × bits / 8) + 8 × m) × num_vectors / 1024³
- Parameter description:
  - dimension: The dimension of the vector.
  - bits: The number of binary bits used to represent each dimension. Valid values are 1, 2, and 4. A smaller bits value results in a higher compression ratio but a greater loss in accuracy.
  - m: The m parameter in the HNSW index.
  - num_vectors: The total number of vectors.
- Example: Assume that you have 1 million vectors, the dimension of each vector is 256, and the m parameter is 16. The following sections provide memory requirement estimates for different compression values.
  - 1-bit quantization (32× compression): In 1-bit quantization, each dimension is represented by 1 bit, which is equivalent to a 32× compression ratio. The memory requirement is estimated as follows: 1.1 × ((256 × 1 / 8) + 8 × 16) × 1,000,000 ≈ 0.176 GB
  - 2-bit quantization (16× compression): In 2-bit quantization, each dimension is represented by 2 bits, which is equivalent to a 16× compression ratio. The memory requirement is estimated as follows: 1.1 × ((256 × 2 / 8) + 8 × 16) × 1,000,000 ≈ 0.211 GB

Example:

// HNSW + Binary Quantization (BQ) example
PUT /<my-bq-index>
{
  "settings" : { 
    "index": { 
      "knn": true 
    } 
  },
  "mappings": {
    "properties": {
      "<my_vector_field>": {
        "type": "knn_vector",
        "dimension": 128,
        "method": {
            "name": "hnsw",
            "engine": "faiss",
            "parameters": {
              "m": 16,
              "ef_construction": 512,
              "encoder": {
                "name": "binary",
                "parameters": {// Enable BQ, use 1-bit quantization
                  "bits": 1 
                }
              }
            }
        }
      }
    }
  }
}

Product quantization (PQ)

Product quantization (PQ) is an advanced vector compression technique. It achieves higher compression ratios than SQ or BQ, but requires a separate training step to build the compression model.

How it works:
1. Vector chunking: First, an original high-dimension vector, such as a 256-dimension vector, is chunked into m low-dimension sub-vectors of equal length. For example, chunking a 256-dimension vector with m=32 results in 32 8-dimension sub-vectors.
2. Codebook training: Next, the system learns a separate codebook for each sub-vector space. This codebook contains 2^code_size centroids. This training process is usually performed using the k-means clustering algorithm.
3. Quantization encoding: After training, when a new vector is encoded, each of its sub-vectors is replaced. Instead of storing the original floating-point value, the system stores the ID of the nearest centroid in the sub-vector's codebook. If code_size is 8, the ID range is 0 to 255, which can be stored in exactly 1 byte.
4. Final result: An original vector is transformed into a sequence of centroid IDs. This method achieves very high compression.
Training requirements: The performance of PQ heavily depends on the quality of the training data. You must provide a set of vectors for training that has a data distribution similar to the data that you will eventually retrieve.
- Training data source: The training data can be a subset of the vector data that you plan to index.
- Recommended amount of training data:
  - When used with HNSW: The recommended number of training vectors is 2^code_size × 1,000.
  - When used with Inverted File (IVF): The recommended number of training vectors is max(1,000 × nlist, 2^code_size × 1,000).
Memory estimation: Take HNSW+PQ as an example. When HNSW is combined with PQ, the memory calculation formula is complex because it includes the overhead of the compressed vectors, the HNSW graph structure, and the PQ codebook.
- Formula: Memory (bytes) ≈ 1.1 × ( (per_vector_cost) × num_vectors + (codebook_cost) )
  - per_vector_cost = (pq_code_size / 8 × pq_m) + 24 + (8 × hnsw_m)
  - codebook_cost = num_segments × (2^pq_code_size) × 4 × dimension
- Parameter description:
  - num_vectors: The total number of vectors.
  - dimension: The dimension of the original vector.
  - pq_m: The number of segments into which the vector is chunked. dimension must be divisible by pq_m.
  - pq_code_size: The size of each sub-vector codebook, in bits. A typical value is 8.
  - hnsw_m: The m parameter in the HNSW index, which is the maximum number of neighbors for each node.
  - num_segments: A low-level technical parameter that represents the number of segments into which the index is divided. For estimation, you can use the number of cluster shards or a conservative value, such as 100.
  - 1.1: A coefficient for system overhead, which is about 10%.
  - 24 and 8: The fixed overhead and pointer overhead for each node in the HNSW graph structure.
  - 4: Represents that the centroid coordinates in the codebook are stored as 32-bit floating-point numbers, which are 4 bytes.
- Example: Assume that you have 1 million vectors (num_vectors). The dimension of each vector (dimension) is 256. The number of chunks for each vector (pq_m) is 32. The size of each sub-vector codebook (pq_code_size) is 8. The m parameter for the HNSW index is 16, and num_segments is 100.
  1. Calculate the overhead for a single vector (per_vector_cost):
    1. Compressed vector size = pq_code_size / 8 × pq_m = 8 / 8 × 32 = 32 bytes.
    2. HNSW graph overhead = 24 + 8 × hnsw_m = 24 + 8 × 16 = 152 bytes.
    3. per_vector_cost = 32 + 152 = 184 bytes
  2. Calculate the total codebook overhead (codebook_cost):
    1. codebook_cost = num_segments × (2^pq_code_size) × 4 × dimension.
    2. codebook_cost = 100 × (2^8) × 4 × 256 = 100 × 256 × 4 × 256 = 26,214,400 bytes.
  3. Calculate the total memory:
    1. Total memory ≈ 1.1 × (per_vector_cost × num_vectors + codebook_cost)
    2. Total memory ≈ 1.1 × (184 × 1,000,000 + 26,214,400) ≈ 231,235,840 bytes ≈ 0.215 GB

Example:

// HNSW + Product Quantization (PQ) example
PUT /<my-hnswpq-index>
{
  "settings" : { 
    "index": { 
      "knn": true 
    } 
  },
  "mappings": {
    "properties": {
      "<my_vector_field>": {
        "type": "knn_vector",
        "dimension": 128, // Dimension must be divisible by m
        "method": {
            "name": "hnsw",
            "engine": "faiss",
            "parameters": {
              "m": 16, // m parameter for HNSW
              "ef_construction": 512,
              "encoder": {
                "name": "pq",
                "parameters": {
                  "m": 4, // m parameter for PQ: chunk the 128-dimension vector into 4 segments of 32 dimensions
                  "code_size": 8
                }
              }
            }
        }
      }
    }
  }
}

Disk-based vector storage

How it works: Disk-based vector search uses internal quantization techniques to compress vectors and stores the main graph structure on disk instead of in heap memory. This memory optimization significantly saves memory, but search latency increases slightly. A high recall rate is still maintained.
Memory estimation: There is no fixed formula. The actual physical memory usage is dynamically managed by the operating system based on the access mode.

Example:

// Disk-based storage example
PUT /<my-ondisk-index>
{
  "settings" : { 
    "index": {
       "knn": true 
    } 
  },
  "mappings": {
    "properties": {
      "<my_vector_field>": {
        "type": "knn_vector",
        "dimension": 128,
        "mode": "on_disk" // Enable disk-based mode
      }
    }
  }
}

Appendix: Complete sample code

This section provides complete sample code for the Java client. The code demonstrates the entire procedure, from creating a vector index to performing a vector search.

Dependency configuration (pom.xml)

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.example</groupId>
    <artifactId>vector-search-demo</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>11</maven.compiler.source>
        <maven.compiler.target>11</maven.compiler.target>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.opensearch.client</groupId>
            <artifactId>opensearch-java</artifactId>
            <version>3.0.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.httpcomponents.client5</groupId>
            <artifactId>httpclient5</artifactId>
            <version>5.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.httpcomponents.core5</groupId>
            <artifactId>httpcore5</artifactId>
            <version>5.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.httpcomponents.core5</groupId>
            <artifactId>httpcore5-h2</artifactId>
            <version>5.3</version>
        </dependency>
        <!-- Jackson databind is needed by opensearch-java -->
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>2.16.1</version>
        </dependency>
    </dependencies>
</project>

Sample program (VectorSearchDemo.java)

package com.example;

import org.apache.hc.client5.http.auth.AuthScope;
import org.apache.hc.client5.http.auth.UsernamePasswordCredentials;
import org.apache.hc.client5.http.impl.auth.BasicCredentialsProvider;
import org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManagerBuilder;
import org.apache.hc.core5.http.HttpHost;
import org.opensearch.client.opensearch.OpenSearchClient;
import org.opensearch.client.opensearch._types.mapping.KeywordProperty;
import org.opensearch.client.opensearch._types.mapping.KnnVectorMethod;
import org.opensearch.client.opensearch._types.mapping.KnnVectorProperty;
import org.opensearch.client.opensearch._types.mapping.Property;
import org.opensearch.client.opensearch._types.mapping.TextProperty;
import org.opensearch.client.opensearch._types.mapping.TypeMapping;
import org.opensearch.client.opensearch._types.query_dsl.KnnQuery;
import org.opensearch.client.opensearch._types.query_dsl.MatchQuery;
import org.opensearch.client.opensearch._types.query_dsl.Query;
import org.opensearch.client.opensearch._types.query_dsl.TermQuery;
import org.opensearch.client.opensearch.core.IndexRequest;
import org.opensearch.client.opensearch.core.SearchRequest;
import org.opensearch.client.opensearch.core.SearchResponse;
import org.opensearch.client.opensearch.core.search.Hit;
import org.opensearch.client.opensearch.indices.CreateIndexRequest;
import org.opensearch.client.opensearch.indices.DeleteIndexRequest;
import org.opensearch.client.transport.OpenSearchTransport;
import org.opensearch.client.transport.httpclient5.ApacheHttpClient5TransportBuilder;
import org.opensearch.client.json.JsonData;
import org.opensearch.client.json.jackson.JacksonJsonpMapper;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;

public class VectorSearchDemo {

    private static final String INDEX_NAME = "test-java-demo-full-search";
    private static final String FIELD_NAME = "test-embedding";
    private static final int VECTOR_DIMENSION = 4;

    public static void main(String[] args) throws IOException {
        OpenSearchClient client = createClient("<polarsearch_host>", <polarsearch_port>, "<polarsearch_username>", "<polarsearch_password>");
        System.out.println("Client initialized.");

        deleteIndexIfExists(client);
        createVectorIndex(client);
        System.out.println("Index '" + INDEX_NAME + "' created.");

        indexSampleData(client);
        System.out.println("Sample data indexed.");
        
        try {
            Thread.sleep(2000);
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }

        List<Float> queryVector = List.of(3.1f, 4.1f, 5.1f, 6.1f);

        // Perform the three types of searches in sequence.
        performBasicKnnSearch(client, queryVector);
        performHybridSearchWithText(client, queryVector);
        performFilteredSearchWithTerm(client, queryVector);
        
        client._transport().close();
        System.out.println("\nClient closed.");
    }

    // Initialize the client.
    private static OpenSearchClient createClient(String hostName, int port, String username, String password) {
        final var host = new HttpHost("http", hostName, port);
        final var credentialsProvider = new BasicCredentialsProvider();
        credentialsProvider.setCredentials(new AuthScope(host), new UsernamePasswordCredentials(username, password.toCharArray()));
        final var connectionManager = PoolingAsyncClientConnectionManagerBuilder.create().build();
        OpenSearchTransport transport = ApacheHttpClient5TransportBuilder.builder(host)
            .setMapper(new JacksonJsonpMapper())
            .setHttpClientConfigCallback(httpClientBuilder ->
                httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider).setConnectionManager(connectionManager)
            ).build();
        return new OpenSearchClient(transport);
    }

    // Delete the index if it exists to ensure a clean test.
    private static void deleteIndexIfExists(OpenSearchClient client) throws IOException {
        if (client.indices().exists(r -> r.index(INDEX_NAME)).value()) {
            client.indices().delete(new DeleteIndexRequest.Builder().index(INDEX_NAME).build());
            System.out.println("Index '" + INDEX_NAME + "' deleted.");
        }
    }

    // Create a vector index.
    private static void createVectorIndex(OpenSearchClient client) throws IOException {
        Property vectorProperty = Property.of(p -> p.knnVector(
            KnnVectorProperty.of(kvp -> kvp
                .dimension(VECTOR_DIMENSION)
                .dataType("float")
                .method(new KnnVectorMethod.Builder()
                    .name("hnsw")
                    .engine("faiss")
                    .spaceType("l2")
                    .parameters(Map.of(
                        "m", JsonData.of(16),
                        "ef_construction", JsonData.of(256)
                    ))
                    .build()
                )
            )
        ));
        
        TypeMapping mapping = TypeMapping.of(m -> m
            .properties(FIELD_NAME, vectorProperty)
            .properties("text", Property.of(p -> p.text(TextProperty.of(t -> t))))
            .properties("category", Property.of(p -> p.keyword(k -> k))) // Add a category field.
        );
        
        CreateIndexRequest request = new CreateIndexRequest.Builder()
            .index(INDEX_NAME)
            .settings(s -> s.knn(true))
            .mappings(mapping)
            .build();
            
        client.indices().create(request);
    }

    // Index the vector data.
    private static void indexSampleData(OpenSearchClient client) throws IOException {
        List<Map<String, Object>> documents = new ArrayList<>();
        documents.add(Map.of("text", "a book about data science", "category", "books", FIELD_NAME, List.of(1.0f, 2.0f, 3.0f, 4.0f)));
        documents.add(Map.of("text", "an intelligent smartphone with a great camera", "category", "electronics", FIELD_NAME, List.of(8.0f, 7.0f, 6.0f, 5.0f)));
        documents.add(Map.of("text", "a technical manual for a smart device", "category", "electronics", FIELD_NAME, List.of(3.0f, 4.0f, 5.0f, 6.0f)));

        for (int i = 0; i < documents.size(); i++) {
            IndexRequest<Map<String, Object>> request = new IndexRequest.Builder<Map<String, Object>>()
                .index(INDEX_NAME)
                .id("doc_" + i)
                .document(documents.get(i))
                .build();
            client.index(request);
        }
    }

    /**
     * Example 1: Basic k-NN search
     */
    private static void performBasicKnnSearch(OpenSearchClient client, List<Float> queryVector) throws IOException {
        System.out.println("\n--- 1. Performing Basic k-NN Search ---");
        System.out.println("Querying for vectors most similar to: " + queryVector);
        int k = 3;

        KnnQuery knnQuery = new KnnQuery.Builder()
            .field(FIELD_NAME)
            .vector(queryVector)
            .k(k)
            .build();

        SearchRequest searchRequest = new SearchRequest.Builder().index(INDEX_NAME).query(new Query.Builder().knn(knnQuery).build()).size(k).build();
        SearchResponse<Map> response = client.search(searchRequest, Map.class);

        printResults(response);
    }

    /**
     * Example 2: Hybrid search (k-NN + text match)
     */
    private static void performHybridSearchWithText(OpenSearchClient client, List<Float> queryVector) throws IOException {
        System.out.println("\n--- 2. Performing Hybrid Search (k-NN + Text Match) ---");
        String textQuery = "book";
        System.out.println("Filtering for documents containing '" + textQuery + "', then finding most similar vectors.");
        int k = 3;

        MatchQuery matchQuery = new MatchQuery.Builder().field("text").query(q -> q.stringValue(textQuery)).build();

        KnnQuery hybridKnnQuery = new KnnQuery.Builder()
            .field(FIELD_NAME)
            .vector(queryVector)
            .k(k)
            .filter(new Query.Builder().match(matchQuery).build())
            .build();
        
        SearchRequest searchRequest = new SearchRequest.Builder().index(INDEX_NAME).query(new Query.Builder().knn(hybridKnnQuery).build()).size(k).build();
        SearchResponse<Map> response = client.search(searchRequest, Map.class);
        
        printResults(response);
    }

    /**
     * Example 3: Filtered search (k-NN + term filter)
     */
    private static void performFilteredSearchWithTerm(OpenSearchClient client, List<Float> queryVector) throws IOException {
        System.out.println("\n--- 3. Performing Filtered Search (k-NN + Term Filter) ---");
        String categoryFilter = "electronics";
        System.out.println("Filtering for documents in category '" + categoryFilter + "', then finding most similar vectors.");
        int k = 3;

        TermQuery termQuery = new TermQuery.Builder().field("category").value(v -> v.stringValue(categoryFilter)).build();
        
        KnnQuery filteredKnnQuery = new KnnQuery.Builder()
            .field(FIELD_NAME)
            .vector(queryVector)
            .k(k)
            .filter(new Query.Builder().term(termQuery).build())
            .build();
        
        SearchRequest searchRequest = new SearchRequest.Builder().index(INDEX_NAME).query(new Query.Builder().knn(filteredKnnQuery).build()).size(k).build();
        SearchResponse<Map> response = client.search(searchRequest, Map.class);

        printResults(response);
    }

    private static void printResults(SearchResponse<Map> response) {
        System.out.println("Search Results:");
        for (Hit<Map> hit : response.hits().hits()) {
            System.out.printf(" - ID: %s, Score: %.4f, Source: %s%n", hit.id(), hit.score(), hit.source());
        }
        if (response.hits().hits().isEmpty()) {
            System.out.println(" - No results found.");
        }
    }
}

How it works

mvn clean package
mvn exec:java -Dexec.mainClass="com.example.VectorSearchDemo"