PolarSearch exposes a REST API for k-nearest neighbor (k-NN) vector search, letting you find semantically similar results across massive datasets of text, images, or any unstructured data.
How it works
Vector search — also called similarity search — finds results based on meaning rather than exact keywords. The process works as follows:
A deep learning model (such as a Large Language Model, or LLM) converts unstructured data into numerical arrays called vector embeddings. Each embedding captures the semantic content of the source data.
PolarSearch stores those embeddings and builds a vector index over them.
When you send a query, PolarSearch converts the query into a vector and runs a k-NN search to find the
kclosest vectors in the index. For example, ifk=5, it returns the five most similar results.
Two core components make this efficient at scale:
Vector indexes narrow the search scope so PolarSearch doesn't scan every vector on every query. PolarSearch supports two index types:
HNSW (Hierarchical Navigable Small World): A graph-based index with low query latency and high recall rate. The entire graph must fit in memory, so it suits datasets where memory is not a constraint.
IVF (Inverted File): A clustering-based index with low memory usage. Better for large datasets with limited memory, at the cost of slightly lower recall rate.
Vector storage optimization reduces memory and storage consumption through quantization or disk offloading — covered in Configure storage optimization.
Usage notes
IVF and PQ require training: Before using an IVF index or product quantization (PQ), you must provide a representative sample of vector data to train the model. Without training, the index cannot function correctly.
HNSW memory: The HNSW graph structure must be fully loaded into memory. Evaluate your cluster's available memory before choosing HNSW.
Disk-based storage increases latency: Storing index data on disk saves memory but adds query latency from disk I/O. Evaluate this trade-off against your latency requirements.
BQ training is automatic: Binary quantization (BQ) trains automatically during index build. No separate training step is needed.
Prerequisites
Before you begin, make sure you have:
Enabled the PolarSearch (intelligent search) feature on your PolarDB cluster. For setup instructions, see PolarSearch user guide.
Step 1: Create a vector index
Creating a vector index involves two parts: enabling k-NN in the index settings and defining a vector field in the mappings.
Choose an index type
HNSW and IVF have different performance and resource profiles. Use this table to pick the right one for your workload:
| HNSW | IVF | |
|---|---|---|
| Query latency | Extremely low — short search paths through a hierarchical graph | Low — must locate a cluster before searching within it |
| Recall rate | High — dense graph connectivity reduces missed neighbors | Medium to high — edge effects at cluster boundaries can reduce accuracy; tune with nprobes |
| Memory usage | High — full graph loaded into memory | Low — stores only centroids and posting lists |
| Build time | Longer — building a high-quality graph is computationally expensive | Faster — but requires an extra training step to generate centroids |
| Best for | Real-time semantic search, image recognition — where query speed and accuracy are critical and memory is sufficient | Large-scale recommendation, massive image retrieval — where memory is limited and a small accuracy trade-off is acceptable |
Configure index settings and mappings
All vector indexes share these common parameters in their mappings:
| Parameter | Description |
|---|---|
engine | Must be faiss. PolarSearch uses Faiss (Facebook AI Similarity Search) as its vector search engine. |
dimension | The vector dimension. Must exactly match the output dimension of your embedding model. |
data_type | The data type of the vector. Default: float. Also supports byte and binary. |
space_type | The distance measure used to calculate vector similarity. |
Supported space_type values:
| Value | Distance measure | Notes |
|---|---|---|
l2 | L2 (Euclidean distance) | Sensitive to vector magnitude |
l1 | L1 (Manhattan distance) | Sum of absolute differences per dimension |
cosinesimil | Cosine similarity | Measures angle between vectors; ignores magnitude |
innerproduct | Inner product | Dot product; commonly used for ranking |
hamming | Hamming distance | For binary vectors only |
chebyshev | L∞ (Chebyshev distance) | Maximum absolute difference across dimensions |
Create an HNSW index
HNSW is implemented via IndexHNSWFlat. Use it when query speed and recall rate are the top priorities.
HNSW parameters
| Parameter | Set at | Mutable after creation | Description |
|---|---|---|---|
m | Index creation | No | Maximum number of neighbors per node. Higher values improve recall rate and connectivity but increase memory usage and build time. Start with 16 or 32 and adjust based on your recall rate and memory tests. Typical range: 8–64. |
ef_construction | Index creation | No | Size of the candidate neighbor list during index build. Higher values produce a higher-quality graph (better recall rate) at the cost of build time. Set to at least 2 × m. For high-quality indexes where build time is not a concern, use 500 or higher. |
ef_search | Index settings or query time | Yes | Size of the candidate list during a query. Higher values increase recall rate but also increase query latency. Start with 50 or 100 and tune based on your latency and recall rate targets. |
When you create an HNSW index, replace<my-index>with your index name and<my_vector_field>with your field name. Configuredimension,data_type,space_type,m, andef_constructionto match your use case.
REST API
// Create an HNSW index
PUT /<my-index>
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"<my_vector_field>": {
"type": "knn_vector",
"dimension": 128,
"data_type": "float",
"method": {
"name": "hnsw",
"engine": "faiss",
"space_type": "l2",
"parameters": {
"m": 16,
"ef_construction": 256
}
}
}
}
}
}Java client
private static void createVectorIndex(OpenSearchClient client) throws IOException {
Property vectorProperty = Property.of(p -> p.knnVector(
KnnVectorProperty.of(kvp -> kvp
.dimension(128)
.dataType("float")
.method(new KnnVectorMethod.Builder()
.name("hnsw")
.engine("faiss")
.spaceType("l2")
.parameters(Map.of(
"m", JsonData.of(16),
"ef_construction", JsonData.of(256)
))
.build()
)
)
));
TypeMapping mapping = TypeMapping.of(m -> m
.properties("<my_vector_field>", vectorProperty)
.properties("text", Property.of(p -> p.text(TextProperty.of(t -> t))))
.properties("category", Property.of(p -> p.keyword(k -> k)))
);
CreateIndexRequest request = new CreateIndexRequest.Builder()
.index(<my-index>)
.settings(s -> s.knn(true))
.mappings(mapping)
.build();
client.indices().create(request);
}Create an IVF index
IVF is implemented via IndexIVFFlat. Use it when your dataset is too large to fit in memory with HNSW.
IVF parameters
| Parameter | Set at | Mutable after creation | Description |
|---|---|---|---|
nlist | Index creation | No | Number of clusters (centroids) to divide the vector space into. More clusters means faster queries but potentially lower recall rate and higher memory for centroids. A common starting point: nlist between 4 × sqrt(N) and 16 × sqrt(N), where N is the number of vectors. For 1 million vectors, try 1,024–4,096. |
nprobes | Query time (recommended) or index creation | Yes | Number of clusters to search at query time. Higher values improve recall rate at the cost of query speed. Start with 10–20 and increase until recall rate meets your requirements. |
When you create an IVF index, replace<my-index>with your index name and<my_vector_field>with your field name. Configuredimension,data_type,space_type,nlist, andnprobesto match your use case.
REST API
// Create an IVF index
PUT /<my-index>
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"<my_vector_field>": {
"type": "knn_vector",
"dimension": 4,
"data_type": "byte",
"method": {
"name": "ivf",
"engine": "faiss",
"space_type": "l2",
"parameters": {
"nlist": 1024,
"nprobes": 10
}
}
}
}
}
}Step 2: Index vector data
Index your documents — each containing a vector field and any metadata — into the index you created.
REST API
POST /_bulk
{ "index": { "_index": "my-index", "_id": "doc_1" } }
{ "my_vector_field": [5.2, 4.4] }
{ "index": { "_index": "my-index", "_id": "doc_2" } }
{ "my_vector_field": [5.2, 3.9] }
{ "index": { "_index": "my-index", "_id": "doc_3" } }
{ "my_vector_field": [4.9, 3.4] }
{ "index": { "_index": "my-index", "_id": "doc_4" } }
{ "my_vector_field": [4.2, 4.6] }
{ "index": { "_index": "my-index", "_id": "doc_5" } }
{ "my_vector_field": [3.3, 4.5] }Java client
private static void indexSampleData(OpenSearchClient client) throws IOException {
List<Map<String, Object>> documents = new ArrayList<>();
documents.add(Map.of("text", "a book about data science", "category", "books", "<my_vector_field>", List.of(1.0f, 2.0f, 3.0f, 4.0f)));
documents.add(Map.of("text", "an intelligent smartphone with a great camera", "category", "electronics", "<my_vector_field>", List.of(8.0f, 7.0f, 6.0f, 5.0f)));
documents.add(Map.of("text", "a technical manual for a smart device", "category", "electronics", "<my_vector_field>", List.of(3.0f, 4.0f, 5.0f, 6.0f)));
for (int i = 0; i < documents.size(); i++) {
IndexRequest<Map<String, Object>> request = new IndexRequest.Builder<Map<String, Object>>()
.index(<my-index>)
.id("doc_" + i)
.document(documents.get(i))
.build();
client.index(request);
}
}Step 3: Perform a vector search
PolarSearch supports three query patterns. Choose based on what you want to filter:
| Query pattern | When to use | API mechanism |
|---|---|---|
| Basic k-NN search | Find the most similar vectors with no pre-filtering | knn query with vector and k |
| Hybrid search (text match + k-NN) | Filter by keyword first, then rank by vector similarity | knn query with filter: match |
| Filtered search (exact value + k-NN) | Filter by category, label, or ID first, then rank by vector similarity | knn query with filter: term |
Basic k-NN search
This returns the k documents whose vectors are closest to the query vector across the entire index.
REST API
POST /<my-index>/_search
{
"size": 3,
"query": {
"knn": {
"<my_vector_field>": {
"vector": [3.1, 4.1, 5.1, 6.1],
"k": 3
}
}
}
}Java client
// Prepare your query vector
List<Float> queryVector = List.of(3.1f, 4.1f, 5.1f, 6.1f);
private static void performBasicKnnSearch(OpenSearchClient client, List<Float> queryVector) throws IOException {
int k = 3;
KnnQuery knnQuery = new KnnQuery.Builder()
.field("<my_vector_field>")
.vector(queryVector)
.k(k)
.build();
SearchRequest searchRequest = new SearchRequest.Builder()
.index(<my-index>)
.query(new Query.Builder().knn(knnQuery).build())
.size(k)
.build();
SearchResponse<Map> response = client.search(searchRequest, Map.class);
printResults(response);
}Hybrid search (text match + k-NN)
Use a match filter inside the knn query to first narrow results to documents whose text contains a keyword, then rank those results by vector similarity.
REST API
POST /<my-index>/_search
{
"size": 3,
"query": {
"knn": {
"<my_vector_field>": {
"vector": [3.1, 4.1, 5.1, 6.1],
"k": 3,
"filter": {
"match": {
"text": "book"
}
}
}
}
}
}Java client
// Prepare your query vector
List<Float> queryVector = List.of(3.1f, 4.1f, 5.1f, 6.1f);
private static void performHybridSearchWithText(OpenSearchClient client, List<Float> queryVector) throws IOException {
String textQuery = "book";
int k = 3;
MatchQuery matchQuery = new MatchQuery.Builder()
.field("text")
.query(q -> q.stringValue(textQuery))
.build();
KnnQuery hybridKnnQuery = new KnnQuery.Builder()
.field("<my_vector_field>")
.vector(queryVector)
.k(k)
.filter(new Query.Builder().match(matchQuery).build())
.build();
SearchRequest searchRequest = new SearchRequest.Builder()
.index(<my-index>)
.query(new Query.Builder().knn(hybridKnnQuery).build())
.size(k)
.build();
SearchResponse<Map> response = client.search(searchRequest, Map.class);
printResults(response);
}Filtered search (exact value + k-NN)
Use a term filter to restrict the search to documents with a specific field value — such as a category or ID — then rank by vector similarity.
REST API
POST /<my-index>/_search
{
"size": 3,
"query": {
"knn": {
"<my_vector_field>": {
"vector": [5, 4],
"k": 3,
"filter": {
"term": {
"category": "electronics"
}
}
}
}
}
}Java client
// Prepare your query vector
List<Float> queryVector = List.of(3.1f, 4.1f, 5.1f, 6.1f);
private static void performFilteredSearchWithTerm(OpenSearchClient client, List<Float> queryVector) throws IOException {
String categoryFilter = "electronics";
int k = 3;
TermQuery termQuery = new TermQuery.Builder()
.field("category")
.value(v -> v.stringValue(categoryFilter))
.build();
KnnQuery filteredKnnQuery = new KnnQuery.Builder()
.field("<my_vector_field>")
.vector(queryVector)
.k(k)
.filter(new Query.Builder().term(termQuery).build())
.build();
SearchRequest searchRequest = new SearchRequest.Builder()
.index(<my-index>)
.query(new Query.Builder().knn(filteredKnnQuery).build())
.size(k)
.build();
SearchResponse<Map> response = client.search(searchRequest, Map.class);
printResults(response);
}Configure storage optimization
High-dimensional float vectors consume substantial memory. PolarSearch provides four optimization strategies. Choose based on your memory constraints and accuracy requirements:
| Strategy | Compression ratio | Accuracy impact | Training required | CPU overhead | When to use |
|---|---|---|---|---|---|
| Scalar quantization (SQ) | 2× (fixed) | Minimal | No | Low | When you need high accuracy with moderate memory reduction |
| Binary quantization (BQ) | 8×–32× | Significant | No (automatic) | Medium | When memory is the top constraint and some accuracy loss is acceptable |
| Product quantization (PQ) | Highest | Medium | Yes | Medium | For massive datasets where you need extreme compression and can afford training time |
| Disk-based storage | — | Significant | No | High | When memory is extremely limited and increased query latency is acceptable |
Scalar quantization (SQ)
SQ converts 32-bit float vectors to 16-bit float (fp16) for storage, halving memory usage. Vectors are decoded back to 32-bit during distance calculation, so accuracy loss is minimal.
Memory estimate
Memory (GB) ≈ 1.1 × (2 × dimension + 8 × m) × num_vectors / 1024³Example: 1 million vectors, dimension = 256, m = 16: 1.1 × (2 × 256 + 8 × 16) × 1,000,000 ≈ 0.656 GB
Example
// HNSW + scalar quantization (SQ)
PUT /<my-sq-index>
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"<my_vector_field>": {
"type": "knn_vector",
"dimension": 128,
"method": {
"name": "hnsw",
"engine": "faiss",
"parameters": {
"m": 16,
"ef_construction": 256,
"encoder": {
"name": "fp16"
}
}
}
}
}
}
}Binary quantization (BQ)
BQ compresses each vector dimension to 1, 2, or 4 binary bits, achieving 8×–32× compression. Training runs automatically when the index is built.
Memory estimate
Memory (GB) ≈ 1.1 × ((dimension × bits / 8) + 8 × m) × num_vectors / 1024³Example: 1 million vectors, dimension = 256, m = 16:
1-bit quantization (32× compression):
1.1 × ((256 × 1 / 8) + 8 × 16) × 1,000,000 ≈ 0.176 GB2-bit quantization (16× compression):
1.1 × ((256 × 2 / 8) + 8 × 16) × 1,000,000 ≈ 0.211 GB
Example
// HNSW + binary quantization (BQ) with 1-bit quantization
PUT /<my-bq-index>
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"<my_vector_field>": {
"type": "knn_vector",
"dimension": 128,
"method": {
"name": "hnsw",
"engine": "faiss",
"parameters": {
"m": 16,
"ef_construction": 512,
"encoder": {
"name": "binary",
"parameters": {
"bits": 1
}
}
}
}
}
}
}
}Product quantization (PQ)
PQ achieves higher compression ratios than SQ or BQ but requires a separate training step. It works by splitting each vector into sub-vectors and replacing each sub-vector with a centroid ID from a learned codebook.
How it works
Split a high-dimensional vector into
mequal-length sub-vectors. For example, a 256-dimension vector withm=32becomes 32 sub-vectors of 8 dimensions each.Train a codebook for each sub-vector space using k-means clustering. Each codebook holds
2^code_sizecentroids.Encode each vector by replacing every sub-vector with the ID of the nearest centroid. With
code_size=8, each ID fits in 1 byte.
Training requirements
PQ performance depends on the quality of training data. Use vectors with a distribution similar to what you plan to index:
| Used with | Minimum training vectors |
|---|---|
| HNSW | 2^code_size × 1,000 |
| IVF | max(1,000 × nlist, 2^code_size × 1,000) |
Memory estimate (HNSW + PQ)
Memory (bytes) ≈ 1.1 × (per_vector_cost × num_vectors + codebook_cost)
per_vector_cost = (pq_code_size / 8 × pq_m) + 24 + (8 × hnsw_m)
codebook_cost = num_segments × (2^pq_code_size) × 4 × dimensionExample: 1 million vectors, dimension = 256, pq_m = 32, pq_code_size = 8, hnsw_m = 16, num_segments = 100:
per_vector_cost= (8/8 × 32) + 24 + (8 × 16) = 32 + 24 + 128 = 184 bytescodebook_cost= 100 × 2^8 × 4 × 256 = 26,214,400 bytesTotal ≈ 1.1 × (184 × 1,000,000 + 26,214,400) ≈ 0.215 GB
Example
// HNSW + product quantization (PQ)
// dimension must be divisible by pq_m
PUT /<my-hnswpq-index>
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"<my_vector_field>": {
"type": "knn_vector",
"dimension": 128,
"method": {
"name": "hnsw",
"engine": "faiss",
"parameters": {
"m": 16,
"ef_construction": 512,
"encoder": {
"name": "pq",
"parameters": {
"m": 4,
"code_size": 8
}
}
}
}
}
}
}
}Disk-based vector storage
Disk-based storage uses internal quantization to compress vectors and moves the main graph structure from heap memory to disk. This significantly reduces memory usage at the cost of higher query latency from disk I/O. The system still maintains a high recall rate.
There is no fixed memory formula — the OS dynamically manages physical memory based on access patterns.
Example
// Disk-based storage
PUT /<my-ondisk-index>
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"<my_vector_field>": {
"type": "knn_vector",
"dimension": 128,
"mode": "on_disk"
}
}
}
}Appendix: Complete sample code
The following Java program demonstrates the full workflow: create an index, index sample data, and run all three search patterns.
Run the demo
mvn clean package
mvn exec:java -Dexec.mainClass="com.example.VectorSearchDemo"