Perform vector search using the OpenSearch protocol - PolarDB

The vector search feature in PolarSearch uses a REST API for efficient similarity searches on unstructured data, such as text and images. This allows you to quickly and accurately find the closest matches in massive datasets, making your application more intelligent.

Overview

Vector retrieval, also known as similarity search, is a technique for finding the most similar data by comparing vector distances. This method is fundamentally different from traditional search, which relies on exact keyword matching.

The core idea is to convert unstructured data, such as text, images, and audio, into numerical representations called vector embeddings using deep learning models like Large Language Models (LLMs). These multi-dimensional vectors capture the data's deep semantic information.

When you submit a query, PolarSearch converts the query content into a vector and performs a k-nearest neighbor (k-NN) search. This core algorithm finds the k vectors within a massive dataset that are closest to your query vector. Here, k is a number you specify (for example, setting k=5 means finding the five most similar results). PolarSearch returns these k most similar results.

To achieve efficient retrieval, PolarSearch relies on two core components: vector index and vector storage optimization.

Vector index: To avoid an exhaustive scan and computation across the entire dataset, you must build a vector index in advance. An index organizes vector data into a query-optimized data structure. During a query, this structure significantly narrows the search space, dramatically improving retrieval performance. PolarSearch supports multiple types of vector indexes. This guide focuses on two industry-leading options: HNSW and IVF.
- HNSW (Hierarchical Navigable Small World): A graph-based index known for its high performance and high recall, but with a large memory overhead. It is ideal for scenarios that require extremely low query latency and high precision, where the entire dataset can fit into memory.
- IVF (Inverted File): A cluster-based inverted index with low memory consumption. It is ideal for processing large-scale datasets in memory-constrained environments, although its search precision is typically slightly lower than HNSW.
Vector storage optimization: High-dimensional vector data consumes significant memory and storage space. PolarSearch provides multiple optimization techniques to reduce resource consumption.
- Vector quantization: A technique that compresses data by reducing the numerical precision of vectors, significantly decreasing storage space. It strikes a balance between the compression ratio and search precision. PolarSearch supports product quantization (PQ), scalar quantization (SQ), and binary quantization (BQ).
- Disk-based storage: For low-memory environments, parts of the vector index can be stored on disk. This lets the vector retrieval service use less memory at the cost of slightly higher query latency.

Notes

Keep the following in mind when using the PolarSearch vector search feature:

Index training requirements: IVF indexes and the PQ (product quantization) technique require a separate training step before use. You must provide a set of representative vector data to train the model. Otherwise, the index will not work properly.
Memory overhead: HNSW indexes offer excellent performance, but their graph structure must be fully loaded into memory, which causes high memory overhead. Evaluate the memory resources of your cluster before you select this index type.
Performance and cost trade-off: Disk-based vector search can increase query latency. Evaluate if this trade-off is acceptable for your application.
Automatic training: Binary quantization (BQ) trains automatically during index creation. No user action is required.

Procedure

Before you begin

To use the REST API for vector search, you must enable PolarSearch. See PolarSearch User Guide for instructions on enabling PolarSearch for new or existing clusters.

Step 1: Create a vector index

To store and search vectors, you must create an index with a specific configuration. This involves two key steps:

Enable k-NN and define a vector field: In the index settings (settings), set the knn parameter to true. This is a master switch that informs PolarDB that the index will be used for vector search.

Key parameters

engine: Set to faiss.
Note
Faiss (Facebook AI Similarity Search) is a high-performance, open-source library developed by Meta AI for efficient similarity search and clustering of massive vector datasets. PolarSearch uses Faiss as its core vector search engine.
dimension: Specifies the dimension of the vector. This value must be identical to the dimension of the vectors that are output by your model.
data_type: Specifies the data type of the vector. The default value is float. You can also select byte or binary to optimize storage.

space_type: Specifies the distance metric used to calculate vector similarity. The supported values are as follows:

`space_type`	Distance metric	Description
`l2`	L2 (Euclidean distance)	Calculates the square root of the sum of squared differences. Sensitive to value magnitude.
`l1`	L1 (Manhattan distance)	Calculates the sum of the absolute differences between vector dimensions.
`cosinesimil`	cosine similarity	Measures the angle between vectors, focusing on direction rather than magnitude.
`innerproduct`	inner product	Calculates the dot product of vectors. Often used for ranking.
`hamming`	Hamming distance	Calculates the number of differing bits in binary vectors.
`chebyshev`	L∞ (Chebyshev distance)	Considers only the maximum absolute difference between vector dimensions.

Define a vector field (HNSW or IVF): In the index's mappings (mappings), you must define a field of the knn_vector type. This field is used specifically to store vector data, and is where you configure the vector dimension, similarity calculation method, and the core indexing method.

Selection guidance

HNSW and IVF offer different trade-offs in performance, memory consumption, and recall, making them suitable for different use cases. Use the following table to choose an indexing method.

Comparison dimension	HNSW	IVF
Query latency	Extremely low. The hierarchical graph structure enables fast candidate location with short search paths.	Low. It first locates a cluster and then searches within it, which creates a longer search path.
Recall	High. Better graph connectivity reduces the chance of missing nearest neighbors.	Medium to high. An edge effect (when a query point is on the boundary of a cluster) may cause a loss in precision. This can be mitigated by adjusting the `nprobes` parameter.
Memory consumption	High. Requires loading the entire graph structure into memory.	Low. Mainly stores cluster centers and inverted lists, resulting in much lower memory overhead than HNSW.
Build time	Long. Building a high-quality graph structure requires complex computations.	Fast. However, it requires an additional training step to generate cluster centers.
Use Cases	Scenarios that require the highest query performance and recall, and have sufficient memory resources. Examples: real-time semantic search, face recognition.	Cost-sensitive scenarios with massive datasets, limited memory, and where a minor loss in recall is acceptable. Examples: large-scale product recommendation, image library retrieval.

Examples

HNSW

HNSW is implemented through IndexHNSWFlat and is suitable for scenarios that require high performance and recall.

Key parameters

Parameter	Value	Description
`m`	Positive integer.	The maximum number of neighbors (out-degree) for each node in the graph. This value determines the graph's density and is the most critical parameter affecting index quality and memory consumption. Larger value: Creates a more connected graph, leading to better search paths and higher recall. However, it also increases the index build time and memory consumption. Smaller value: Results in a faster build time and lower memory consumption, but may cause the search to fall into a local optimum, reducing recall. Recommendation: Use a value between 8 and 64. Start with 16 or 32 and adjust based on your tests for recall and memory consumption.
`ef_construction`	Must be a positive integer and should usually be greater than `m`.	The size of the dynamic candidate list during index construction. It controls the search depth and breadth when building the graph. This value primarily affects the build time and the final quality of the index. Larger value: Allows the algorithm to explore more potential neighbors when inserting a new node, resulting in a higher-quality graph (which improves recall) but significantly increases the build time. Recommendation: We recommend setting this value to 2 times `m` or higher. If build time is not a concern but you want a high-quality index, you can set the value to 500 or higher.
`ef_search`	Positive integer.	The size of the dynamic candidate list at query time. It controls the search depth during a query. Note This parameter is not specified when an index is created, but is set globally at query time or in the index's `settings`. It directly affects query latency and recall. Larger value: The query explores more nodes, which improves recall but increases query latency. Recommendation: There is no fixed recommendation for this value. You need to find the optimal balance between latency and recall through load testing. You can start with a small value, such as 50 or 100, and gradually increase it while observing the performance.

Note

When you create an HNSW index, replace <my-index> with your index name and <my_vector_field> with your field name. In addition, configure other core parameters such as dimension, data_type, space_type, m, and ef_construction based on your actual business requirements.

REST API

// HNSW index creation example. Replace <my-index> with your actual index name.
PUT /<my-index>
{
  "settings": {
    "index": { 
      "knn": true 
    } 
  },
  "mappings": {
    "properties": {
      "<my_vector_field>": {// Replace <my_vector_field> with your actual field name.
        "type": "knn_vector",
        "dimension": 4,
        "data_type": "float",
        "method": {
          "name": "hnsw",
          "engine": "faiss",
          "space_type": "l2",
          "parameters": {
            "m": 16,
            "ef_construction": 256
          }
        }
      }
    }
  }
}

Java client

private static void createVectorIndex(OpenSearchClient client) throws IOException {
    Property vectorProperty = Property.of(p -> p.knnVector(
        KnnVectorProperty.of(kvp -> kvp
            .dimension(128)
            .dataType("float")
            .method(new KnnVectorMethod.Builder()
                .name("hnsw")
                .engine("faiss")
                .spaceType("l2")
                .parameters(Map.of(
                    "m", JsonData.of(16),
                    "ef_construction", JsonData.of(256)
                ))
                .build()
            )
        )
    ));
    
    TypeMapping mapping = TypeMapping.of(m -> m
        .properties("<my_vector_field>", vectorProperty)
        .properties("text", Property.of(p -> p.text(TextProperty.of(t -> t))))
        .properties("category", Property.of(p -> p.keyword(k -> k)))
    );
    
    CreateIndexRequest request = new CreateIndexRequest.Builder()
        .index(<my-index>)
        .settings(s -> s.knn(true))
        .mappings(mapping)
        .build();
        
    client.indices().create(request);
}

IVF

IVF is implemented by IndexIVFFlat and is suitable for very large-scale datasets in memory-constrained scenarios.

Key parameters

Parameter

Value

Description

nlist

Positive integer.

The number of cluster centers. The index divides the entire vector space into nlist regions (clusters). This value is fundamental to IVF performance.

Larger value: Leads to finer-grained partitions with fewer vectors per cluster. This reduces the amount of data to be scanned during a query and improves speed. However, it can increase the edge effect, which reduces recall, and also increases memory consumption.
Smaller value: Results in more vectors per cluster, which slows down the search but can lead to higher recall.
A common rule of thumb is to set nlist to a value between 4 * sqrt(N) and 16 * sqrt(N), where N is the total number of vectors. For example, for 1 million vectors, sqrt(N) = 1000, so nlist can be set to a value between 4,000 and 16,000. A value of 1024 or 4096 is a good starting point.

nprobes

A positive integer, which should typically be less than nlist.

The number of cluster centers (clusters) to search at query time. This is the most direct parameter for trading query speed for recall.

Larger value: Causes the query to visit more clusters, expanding the search scope. This can effectively mitigate the edge effect and improve recall, but it also linearly increases query latency.
Smaller value: Results in faster queries. However, if a query vector lies on the boundary of multiple clusters, the nearest neighbors might be missed, which reduces recall.
Recommendation: Start with a small value, such as 10 or 20, and gradually increase it based on your recall requirements until an acceptable performance balance is achieved.

Note

When you create an IVF index, replace <my-index> with your index name and <my_vector_field> with your field name. Also, configure other core parameters such as dimension, data_type, space_type, nlist, and nprobes based on your business requirements.

// IVF index creation example. Replace <my-index> with your actual index name.
PUT /<my-index>
{
  "settings": {
    "index": { 
      "knn": true 
    } 
  },
  "mappings": {
    "properties": {
      "<my_vector_field>": {// Replace <my_vector_field> with your actual field name.
        "type": "knn_vector",
        "dimension": 4,
        "data_type": "float",
        "method": {
          "name": "ivf",
          "engine": "faiss",
          "space_type": "l2",
          "parameters": {
            "nlist": 1024,
            "nprobes": 10 // nprobes is typically specified at query time. This is just an example.
          }
        }
      }
    }
  }
}

Step 2: Index vector data

Prepare your documents, including vector data and other metadata, and add them to the index you just created.

REST API

POST /_bulk
{ "index": { "_index": "my-index", "_id": "doc_1" } }
{ "my_vector_field": [5.2, 4.4, 0.0, 0.0] }
{ "index": { "_index": "my-index", "_id": "doc_2" } }
{ "my_vector_field": [5.2, 3.9, 0.0, 0.0] }
{ "index": { "_index": "my-index", "_id": "doc_3" } }
{ "my_vector_field": [4.9, 3.4, 0.0, 0.0] }
{ "index": { "_index": "my-index", "_id": "doc_4" } }
{ "my_vector_field": [4.2, 4.6, 0.0, 0.0] }
{ "index": { "_index": "my-index", "_id": "doc_5" } }
{ "my_vector_field": [3.3, 4.5, 0.0, 0.0] }

Java client

private static void indexSampleData(OpenSearchClient client) throws IOException {
    List<Map<String, Object>> documents = new ArrayList<>();
    documents.add(Map.of("text", "a book about data science", "category", "books", "<my_vector_field>", List.of(1.0f, 2.0f, 3.0f, 4.0f)));
    documents.add(Map.of("text", "an intelligent smartphone with a great camera", "category", "electronics", "<my_vector_field>", List.of(8.0f, 7.0f, 6.0f, 5.0f)));
    documents.add(Map.of("text", "a technical manual for a smart device", "category", "electronics", "<my_vector_field>", List.of(3.0f, 4.0f, 5.0f, 6.0f)));

    for (int i = 0; i < documents.size(); i++) {
        IndexRequest<Map<String, Object>> request = new IndexRequest.Builder<Map<String, Object>>()
            .index(<my-index>)
            .id("doc_" + i)
            .document(documents.get(i))
            .build();
        client.index(request);
    }
}

Step 3: Perform vector search

You can send vector search requests to find the most similar results to your query vector within a massive dataset.

Basic k-NN search

This is the most basic vector search. It finds the k results with the smallest distance to the query vector across the entire index.

REST API

POST /<my-index>/_search
{
  "size": 3,
  "query": {
    "knn": {
      "<my_vector_field>": {
        "vector": [3.1, 4.1, 5.1, 6.1],
        "k": 3
      }
    }
  }
}

Java client

// Prepare your query vector.
List<Float> queryVector = List.of(3.1f, 4.1f, 5.1f, 6.1f);

private static void performBasicKnnSearch(OpenSearchClient client, List<Float> queryVector) throws IOException {
    System.out.println("\n--- 1. Performing Basic k-NN Search ---");
    System.out.println("Querying for vectors most similar to: " + queryVector);
    // Find the 3 most similar results.
    int k = 3;

    KnnQuery knnQuery = new KnnQuery.Builder()
        .field("<my_vector_field>")
        .vector(queryVector)
        .k(k)
        .build();

    SearchRequest searchRequest = new SearchRequest.Builder().index(<my-index>).query(new Query.Builder().knn(knnQuery).build()).size(k).build();
    SearchResponse<Map> response = client.search(searchRequest, Map.class);

    printResults(response);
}

Hybrid search

In many scenarios, you need to narrow the search scope with one or more conditions before you perform a vector search. This is the core idea of hybrid search. You can use the KnnQuery in conjunction with the filter clause to achieve this. The filter itself can be any standard OpenSearch query, such as term (exact value matching) or match (full-text search).

Text match filter

This is useful for classic "keyword + vector" hybrid search scenarios. For example, you can first search for all documents that contain "new smartphone" in their description and then rank them by vector similarity.

REST API

POST /<my-index>/_search
{
  "size": 3,
  "query": {
    "knn": {
      "<my_vector_field>": {
        "vector": [3.1, 4.1, 5.1, 6.1],
        "k": 3,
        "filter": {
          "match": {
            "text": "book"
          }
        }
      }
    }
  }
}

Java client

// Prepare your query vector.
List<Float> queryVector = List.of(3.1f, 4.1f, 5.1f, 6.1f);

private static void performHybridSearchWithText(OpenSearchClient client, List<Float> queryVector) throws IOException {
    System.out.println("\n--- 2. Performing Hybrid Search (k-NN + Text Match) ---");
    // Prepare your query keyword.
    String textQuery = "book";
    System.out.println("Filtering for documents containing '" + textQuery + "', then finding most similar vectors.");
    // Find the 3 most similar results.
    int k = 3;

    MatchQuery matchQuery = new MatchQuery.Builder().field("text").query(q -> q.stringValue(textQuery)).build();

    KnnQuery hybridKnnQuery = new KnnQuery.Builder()
        .field("<my_vector_field>")
        .vector(queryVector)
        .k(k)
        .filter(new Query.Builder().match(matchQuery).build())
        .build();
    
    SearchRequest searchRequest = new SearchRequest.Builder().index(<my-index>).query(new Query.Builder().knn(hybridKnnQuery).build()).size(k).build();
    SearchResponse<Map> response = client.search(searchRequest, Map.class);
    
    printResults(response);
}

Term filter

This is useful for scenarios where you filter by a specific tag, category, or ID. For example, searching for the most similar products only within the "electronics" category.

REST API

POST /<my-index>/_search
{
  "size": 3,
  "query": {
    "knn": {
      "<my_vector_field>": {
        "vector": [5.0, 4.0, 1.0, 2.0],
        "k": 3,
        "filter": {
          "term": {
            "category": "electronics"
          }
        }
      }
    }
  }
}

Java client

// Prepare your query vector.
List<Float> queryVector = List.of(3.1f, 4.1f, 5.1f, 6.1f);

private static void performFilteredSearchWithTerm(OpenSearchClient client, List<Float> queryVector) throws IOException {
    System.out.println("\n--- 3. Performing Filtered Search (k-NN + Term Filter) ---");
    // Prepare your query category.
    String categoryFilter = "electronics";
    System.out.println("Filtering for documents in category '" + categoryFilter + "', then finding most similar vectors.");
    // Find the 3 most similar results.
    int k = 3;

    TermQuery termQuery = new TermQuery.Builder().field("category").value(v -> v.stringValue(categoryFilter)).build();
    
    KnnQuery filteredKnnQuery = new KnnQuery.Builder()
        .field("<my_vector_field>")
        .vector(queryVector)
        .k(k)
        .filter(new Query.Builder().term(termQuery).build())
        .build();
    
    SearchRequest searchRequest = new SearchRequest.Builder().index(<my-index>).query(new Query.Builder().knn(filteredKnnQuery).build()).size(k).build();
    SearchResponse<Map> response = client.search(searchRequest, Map.class);

    printResults(response);
}

Configure storage optimization

Vector data, especially high-dimensional floating-point vectors, consumes significant memory. PolarSearch offers storage optimization techniques that balance memory cost, query performance, and search accuracy by compressing (quantizing) vectors or changing the storage medium.

Recommendations

Before you choose an optimization strategy, refer to the following table to find the best option for your use case.

Optimization strategy	Compression ratio	Accuracy impact	Training required	CPU overhead	Use cases
Scalar Quantization (SQ)	Low (fixed 2x)	Minimal	No	Low	Scenarios requiring extremely high search accuracy and moderate memory optimization with minimal accuracy loss.
Binary Quantization (BQ)	High (8x–32x)	Significant	No	Medium	Extremely memory-sensitive scenarios that can tolerate significant accuracy loss to achieve maximum memory savings.
Product Quantization (PQ)	Highest	Medium	Yes	Medium	Scenarios with massive datasets requiring the highest compression ratio, where investing time in model training is acceptable to balance accuracy and memory.
Disk-based vector storage	-	Significant	No	High	Cost-sensitive scenarios with extremely limited memory resources where you can trade higher query latency (due to disk I/O) for the lowest possible memory consumption.

Configuration

Scalar quantization (SQ)

How it works: This method converts standard 32-bit floating-point (float) vectors into 16-bit floating-point (fp16) vectors for storage, reducing the memory footprint by half. During distance calculations, vectors are decoded back to 32-bit, minimizing the impact on accuracy.
Memory estimation:
- Formula: Memory (GB) ≈ 1.1 * (2 * dimension + 8 * m) * num_vectors / 1024^3
- Parameters:
  - dimension: The dimension of the vector.
  - m: The m parameter in the HNSW index, which is the maximum number of neighbors for each node.
  - num_vectors: The total number of vectors.
  - 1.1: An approximate 10% overhead factor for system usage.
- Example: Assume you have 1 million vectors, each with a dimension of 256, and the HNSW m parameter is 16. The estimated memory requirement is: 1.1 * (2 * 256 + 8 * 16) * 1,000,000 ~= 0.656 GB

Usage example:

// HNSW with Scalar Quantization (SQ) example
PUT /<my-sq-index>
{
  "settings": {
    "index": { 
      "knn": true 
    }
  },
  "mappings": {
    "properties": {
      "<my_vector_field>": {
        "type": "knn_vector",
        "dimension": 128,
        "method": {
          "name": "hnsw",
          "engine": "faiss",
          "parameters": {
            "m": 16,
            "ef_construction": 256,
            "encoder": {// Enable SQ
              "name": "sq",
              "parameters": {
                "type": "fp16"
              }
            }
          }
        }
      }
    }
  }
}

Binary quantization (BQ)

How it works: This method compresses each dimension of a floating-point vector into a binary bit (0 or 1) for storage, achieving a very high compression ratio. The training process is handled automatically during index creation.
Memory estimation:
- Formula: Memory (GB) ≈ 1.1 * ((dimension * bits / 8) + 8 * m) * num_vectors / 1024^3
- Parameters:
  - dimension: The dimension of the vector.
  - bits: The number of binary bits used to represent each dimension. Valid values are 1, 2, and 4. A smaller value for bits results in a higher compression ratio but greater accuracy loss.
  - m: The m parameter in the HNSW index.
  - num_vectors: The total number of vectors.
- Example: Assume you have 1 million vectors, each with a dimension of 256, and the HNSW m parameter is 16. The estimated memory requirements for different compression levels are as follows:
  - 1-bit quantization (32x compression): Each dimension is represented by 1 bit. The estimated memory requirement is: 1.1 * ((256 * 1 / 8) + 8 * 16) * 1,000,000 ~= 0.176 GB
  - 2-bit quantization (16x compression): Each dimension is represented by 2 bits. The estimated memory requirement is: 1.1 * ((256 * 2 / 8) + 8 * 16) * 1,000,000 ~= 0.211 GB

Usage example:

// HNSW with Binary Quantization (BQ) example
PUT /<my-bq-index>
{
  "settings" : { 
    "index": { 
      "knn": true 
    } 
  },
  "mappings": {
    "properties": {
      "<my_vector_field>": {
        "type": "knn_vector",
        "dimension": 128,
        "method": {
            "name": "hnsw",
            "engine": "faiss",
            "parameters": {
              "m": 16,
              "ef_construction": 512,
              "encoder": {
                "name": "binary",
                "parameters": {// Enable BQ with 1-bit quantization
                  "bits": 1 
                }
              }
            }
        }
      }
    }
  }
}

Product quantization (PQ)

Product quantization is an advanced vector compression technique that achieves a higher compression ratio than SQ or BQ. However, it requires a separate training step to build the compression model.

How it works:
1. Vector splitting: An original high-dimensional vector (for example, 256 dimensions) is split into m low-dimensional sub-vectors of equal length. For example, splitting a 256-dimensional vector with m=32 results in 32 sub-vectors of 8 dimensions each.
2. Codebook training: For each sub-vector space, the system learns a separate codebook. This codebook contains 2^code_size centers, also known as centroids. The K-means clustering algorithm typically performs this training.
3. Quantization and encoding: After training, when a new vector is encoded, the system replaces each of its sub-vectors with the ID of the nearest centroid in the corresponding sub-vector space's codebook. If code_size is 8, the ID ranges from 0 to 255, which can be stored in a single byte.
4. Final result: This process transforms an original vector into a sequence of centroid IDs, achieving very high compression.
Training requirements: The performance of PQ heavily depends on the quality of the training data. You must provide a set of vectors with a distribution similar to your search data.
- Training data source: You can use a subset of the vectors you plan to index.
- Recommended training data size:
  - When used with HNSW: The recommended number of training vectors is 2^code_size * 1000.
  - When used with IVF: The recommended number of training vectors is max(1000 * nlist, 2^code_size * 1000).
Memory estimation: The memory calculation for HNSW with PQ is complex, as it includes the cost of the compressed vectors, the HNSW graph structure, and the PQ codebook.
- Formula: Memory (bytes) ≈ 1.1 * ( (per_vector_cost) * num_vectors + (codebook_cost) )
  - per_vector_cost = (pq_code_size / 8 * pq_m) + 24 + (8 * hnsw_m)
  - codebook_cost = num_segments * (2^pq_code_size) * 4 * dimension
- Parameters:
  - num_vectors: The total number of vectors.
  - dimension: The dimension of the original vector.
  - pq_m: The number of segments the vector is split into. The dimension must be divisible by pq_m.
  - pq_code_size: The size of the codebook for each sub-vector, in bits. This is typically 8.
  - hnsw_m: The m parameter in the HNSW index, which is the maximum number of neighbors for each node.
  - num_segments: An underlying technical parameter that represents the number of segments the index is divided into. For estimation, you can use the shard count of your cluster or a conservative value like 100.
  - 1.1: An overhead factor of approximately 10% for system usage.
  - 24 and 8: The fixed and pointer overheads, respectively, for each node in the HNSW graph structure.
  - 4: The size in bytes of a 32-bit floating-point number, used to store the centroid coordinates in the codebook.
- Example: Assume you have 1 million vectors (num_vectors), each with a dimension of 256. The vector is split into 32 segments (pq_m), and each sub-vector codebook size (pq_code_size) is 8. The HNSW m parameter is 16, and num_segments is 100.
  1. Calculate the cost per vector (per_vector_cost):
    1. Compressed vector size = pq_code_size / 8 * pq_m = 8 / 8 * 32 = 32 bytes.
    2. HNSW graph overhead = 24 + 8 * hnsw_m = 24 + 8 * 16 = 152 bytes.
    3. per_vector_cost = 32 + 152 = 184 bytes
  2. Calculate the total codebook cost (codebook_cost):
    1. codebook_cost = num_segments * (2^pq_code_size) * 4 * dimension
    2. codebook_cost = 100 * (2^8) * 4 * 256 = 100 * 256 * 4 * 256 = 26,214,400 bytes
  3. Calculate the total memory:
    1. Total memory ≈ 1.1 * (per_vector_cost * num_vectors + codebook_cost)
    2. Total memory ≈ 1.1 * (184 * 1,000,000 + 26,214,400) ≈ 231,235,840 bytes ≈ 0.215 GB

Usage example:

// HNSW with Product Quantization (PQ) example
PUT /<my-hnswpq-index>
{
  "settings" : { 
    "index": { 
      "knn": true 
    } 
  },
  "mappings": {
    "properties": {
      "<my_vector_field>": {
        "type": "knn_vector",
        "dimension": 128, // The dimension must be divisible by m
        "method": {
            "name": "hnsw",
            "engine": "faiss",
            "parameters": {
              "m": 16, // m parameter for HNSW
              "ef_construction": 512,
              "encoder": {
                "name": "pq",
                "parameters": {
                  "m": 4, // m parameter for PQ: splits the 128-dim vector into 4 segments of 32-dim each
                  "code_size": 8
                }
              }
            }
        }
      }
    }
  }
}

Disk storage

How it works: Disk-based vector search uses internal quantization techniques to compress vectors and stores the main graph structure on disk instead of in heap memory. This optimization significantly reduces memory consumption but slightly increases query latency while maintaining high recall.
Memory estimation: There is no fixed formula. The actual physical memory usage is dynamically managed by the operating system based on access patterns.

Usage example:

// Disk-based storage example
PUT /<my-ondisk-index>
{
  "settings" : { 
    "index": {
       "knn": true 
    } 
  },
  "mappings": {
    "properties": {
      "<my_vector_field>": {
        "type": "knn_vector",
        "dimension": 128,
        "mode": "on_disk" // Enable the disk-based mode
      }
    }
  }
}

Appendix: Complete code example

This complete code example for the OpenSearch Java client demonstrates how to create a vector index and perform a vector search.

Dependency configuration (pom.xml)

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.example</groupId>
    <artifactId>vector-search-demo</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>11</maven.compiler.source>
        <maven.compiler.target>11</maven.compiler.target>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.opensearch.client</groupId>
            <artifactId>opensearch-java</artifactId>
            <version>3.0.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.httpcomponents.client5</groupId>
            <artifactId>httpclient5</artifactId>
            <version>5.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.httpcomponents.core5</groupId>
            <artifactId>httpcore5</artifactId>
            <version>5.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.httpcomponents.core5</groupId>
            <artifactId>httpcore5-h2</artifactId>
            <version>5.3</version>
        </dependency>
        <!-- Jackson databind is needed by opensearch-java -->
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>2.16.1</version>
        </dependency>
    </dependencies>
</project>

Example program (VectorSearchDemo.java)

package com.example;

import org.apache.hc.client5.http.auth.AuthScope;
import org.apache.hc.client5.http.auth.UsernamePasswordCredentials;
import org.apache.hc.client5.http.impl.auth.BasicCredentialsProvider;
import org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManagerBuilder;
import org.apache.hc.core5.http.HttpHost;
import org.opensearch.client.opensearch.OpenSearchClient;
import org.opensearch.client.opensearch._types.mapping.KeywordProperty;
import org.opensearch.client.opensearch._types.mapping.KnnVectorMethod;
import org.opensearch.client.opensearch._types.mapping.KnnVectorProperty;
import org.opensearch.client.opensearch._types.mapping.Property;
import org.opensearch.client.opensearch._types.mapping.TextProperty;
import org.opensearch.client.opensearch._types.mapping.TypeMapping;
import org.opensearch.client.opensearch._types.query_dsl.KnnQuery;
import org.opensearch.client.opensearch._types.query_dsl.MatchQuery;
import org.opensearch.client.opensearch._types.query_dsl.Query;
import org.opensearch.client.opensearch._types.query_dsl.TermQuery;
import org.opensearch.client.opensearch.core.IndexRequest;
import org.opensearch.client.opensearch.core.SearchRequest;
import org.opensearch.client.opensearch.core.SearchResponse;
import org.opensearch.client.opensearch.core.search.Hit;
import org.opensearch.client.opensearch.indices.CreateIndexRequest;
import org.opensearch.client.opensearch.indices.DeleteIndexRequest;
import org.opensearch.client.transport.OpenSearchTransport;
import org.opensearch.client.transport.httpclient5.ApacheHttpClient5TransportBuilder;
import org.opensearch.client.json.JsonData;
import org.opensearch.client.json.jackson.JacksonJsonpMapper;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;

public class VectorSearchDemo {

    private static final String INDEX_NAME = "test-java-demo-full-search";
    private static final String FIELD_NAME = "test-embedding";
    private static final int VECTOR_DIMENSION = 4;

    public static void main(String[] args) throws IOException {
        OpenSearchClient client = createClient("<polarsearch_host>", <polarsearch_port>, "<polarsearch_username>", "<polarsearch_password>");
        System.out.println("Client initialized.");

        deleteIndexIfExists(client);
        createVectorIndex(client);
        System.out.println("Index '" + INDEX_NAME + "' created.");

        indexSampleData(client);
        System.out.println("Sample data indexed.");
        
        try {
            Thread.sleep(2000);
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }

        List<Float> queryVector = List.of(3.1f, 4.1f, 5.1f, 6.1f);

        // --- Run three searches in sequence ---
        performBasicKnnSearch(client, queryVector);
        performHybridSearchWithText(client, queryVector);
        performFilteredSearchWithTerm(client, queryVector);
        
        client._transport().close();
        System.out.println("\nClient closed.");
    }

    // Initialize the client.
    private static OpenSearchClient createClient(String hostName, int port, String username, String password) {
        final var host = new HttpHost("http", hostName, port);
        final var credentialsProvider = new BasicCredentialsProvider();
        credentialsProvider.setCredentials(new AuthScope(host), new UsernamePasswordCredentials(username, password.toCharArray()));
        final var connectionManager = PoolingAsyncClientConnectionManagerBuilder.create().build();
        OpenSearchTransport transport = ApacheHttpClient5TransportBuilder.builder(host)
            .setMapper(new JacksonJsonpMapper())
            .setHttpClientConfigCallback(httpClientBuilder ->
                httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider).setConnectionManager(connectionManager)
            ).build();
        return new OpenSearchClient(transport);
    }

    // Delete the index if it exists to ensure a clean run.
    private static void deleteIndexIfExists(OpenSearchClient client) throws IOException {
        if (client.indices().exists(r -> r.index(INDEX_NAME)).value()) {
            client.indices().delete(new DeleteIndexRequest.Builder().index(INDEX_NAME).build());
            System.out.println("Index '" + INDEX_NAME + "' deleted.");
        }
    }

    // Create the vector index.
    private static void createVectorIndex(OpenSearchClient client) throws IOException {
        Property vectorProperty = Property.of(p -> p.knnVector(
            KnnVectorProperty.of(kvp -> kvp
                .dimension(VECTOR_DIMENSION)
                .dataType("float")
                .method(new KnnVectorMethod.Builder()
                    .name("hnsw")
                    .engine("faiss")
                    .spaceType("l2")
                    .parameters(Map.of(
                        "m", JsonData.of(16),
                        "ef_construction", JsonData.of(256)
                    ))
                    .build()
                )
            )
        ));
        
        TypeMapping mapping = TypeMapping.of(m -> m
            .properties(FIELD_NAME, vectorProperty)
            .properties("text", Property.of(p -> p.text(TextProperty.of(t -> t))))
            .properties("category", Property.of(p -> p.keyword(k -> k))) // Add a category field.
        );
        
        CreateIndexRequest request = new CreateIndexRequest.Builder()
            .index(INDEX_NAME)
            .settings(s -> s.knn(true))
            .mappings(mapping)
            .build();
            
        client.indices().create(request);
    }

    // Index vector data.
    private static void indexSampleData(OpenSearchClient client) throws IOException {
        List<Map<String, Object>> documents = new ArrayList<>();
        documents.add(Map.of("text", "a book about data science", "category", "books", FIELD_NAME, List.of(1.0f, 2.0f, 3.0f, 4.0f)));
        documents.add(Map.of("text", "an intelligent smartphone with a great camera", "category", "electronics", FIELD_NAME, List.of(8.0f, 7.0f, 6.0f, 5.0f)));
        documents.add(Map.of("text", "a technical manual for a smart device", "category", "electronics", FIELD_NAME, List.of(3.0f, 4.0f, 5.0f, 6.0f)));

        for (int i = 0; i < documents.size(); i++) {
            IndexRequest<Map<String, Object>> request = new IndexRequest.Builder<Map<String, Object>>()
                .index(INDEX_NAME)
                .id("doc_" + i)
                .document(documents.get(i))
                .build();
            client.index(request);
        }
    }

    /**
     * Example 1: basic k-NN search
     */
    private static void performBasicKnnSearch(OpenSearchClient client, List<Float> queryVector) throws IOException {
        System.out.println("\n--- 1. Performing Basic k-NN Search ---");
        System.out.println("Querying for vectors most similar to: " + queryVector);
        int k = 3;

        KnnQuery knnQuery = new KnnQuery.Builder()
            .field(FIELD_NAME)
            .vector(queryVector)
            .k(k)
            .build();

        SearchRequest searchRequest = new SearchRequest.Builder().index(INDEX_NAME).query(new Query.Builder().knn(knnQuery).build()).size(k).build();
        SearchResponse<Map> response = client.search(searchRequest, Map.class);

        printResults(response);
    }

    /**
     * Example 2: hybrid search (k-NN + text match)
     */
    private static void performHybridSearchWithText(OpenSearchClient client, List<Float> queryVector) throws IOException {
        System.out.println("\n--- 2. Performing Hybrid Search (k-NN + Text Match) ---");
        String textQuery = "book";
        System.out.println("Filtering for documents containing '" + textQuery + "', then finding most similar vectors.");
        int k = 3;

        MatchQuery matchQuery = new MatchQuery.Builder().field("text").query(q -> q.stringValue(textQuery)).build();

        KnnQuery hybridKnnQuery = new KnnQuery.Builder()
            .field(FIELD_NAME)
            .vector(queryVector)
            .k(k)
            .filter(new Query.Builder().match(matchQuery).build())
            .build();
        
        SearchRequest searchRequest = new SearchRequest.Builder().index(INDEX_NAME).query(new Query.Builder().knn(hybridKnnQuery).build()).size(k).build();
        SearchResponse<Map> response = client.search(searchRequest, Map.class);
        
        printResults(response);
    }

    /**
     * Example 3: filtered search (k-NN + term filter)
     */
    private static void performFilteredSearchWithTerm(OpenSearchClient client, List<Float> queryVector) throws IOException {
        System.out.println("\n--- 3. Performing Filtered Search (k-NN + Term Filter) ---");
        String categoryFilter = "electronics";
        System.out.println("Filtering for documents in category '" + categoryFilter + "', then finding most similar vectors.");
        int k = 3;

        TermQuery termQuery = new TermQuery.Builder().field("category").value(v -> v.stringValue(categoryFilter)).build();
        
        KnnQuery filteredKnnQuery = new KnnQuery.Builder()
            .field(FIELD_NAME)
            .vector(queryVector)
            .k(k)
            .filter(new Query.Builder().term(termQuery).build())
            .build();
        
        SearchRequest searchRequest = new SearchRequest.Builder().index(INDEX_NAME).query(new Query.Builder().knn(filteredKnnQuery).build()).size(k).build();
        SearchResponse<Map> response = client.search(searchRequest, Map.class);

        printResults(response);
    }

    private static void printResults(SearchResponse<Map> response) {
        System.out.println("Search Results:");
        for (Hit<Map> hit : response.hits().hits()) {
            System.out.printf(" - ID: %s, Score: %.4f, Source: %s%n", hit.id(), hit.score(), hit.source());
        }
        if (response.hits().hits().isEmpty()) {
            System.out.println(" - No results found.");
        }
    }
}

How to run

mvn clean package
mvn exec:java -Dexec.mainClass="com.example.VectorSearchDemo"