The Alibaba Cloud Elasticsearch vector engine lets you run large-scale vector similarity search alongside full-text search in a single cluster. Use it to build recommendation systems, image retrieval pipelines, and natural language processing applications.
The vector engine stores HNSW indexes in off-heap memory. Before selecting a data node specification and count, estimate your off-heap memory usage using the memory calculation guidance later in this topic.
The vector engine is updated regularly. Use the latest version of Alibaba Cloud Elasticsearch 8.x for the best performance and cost efficiency.
Prerequisites
Before you begin, ensure that you have:
-
A running Alibaba Cloud Elasticsearch 8.x instance. If you haven't created one, follow Quick start: From creating an instance to retrieving data to create one.
Choose a search method
Before creating an index, decide which kNN search method fits your use case:
| Method | How it works | Best for |
|---|---|---|
| Approximate kNN | Uses the knn clause with HNSW indexing for fast, scalable search |
Most production workloads |
| Exact kNN | Uses a script_score query for brute-force scoring of every document |
Small datasets or precise scoring |
Approximate kNN offers low latency and good accuracy. Exact kNN guarantees accurate results but does not scale to large datasets.
Step 1: Create an index
Create an index with a dense_vector field to store vector data:
PUT /my_vector_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"my_vector": {
"type": "dense_vector",
"dims": 3
},
"my_text": {
"type": "keyword"
}
}
}
}
Key points:
-
Set
dimsto the output dimension of your embedding model. All documents in the index must use the same dimension. -
Decide
number_of_shardsandnumber_of_replicasbased on your data volume and performance requirements. -
dense_vectorsupports additional parameters such as similarity metrics and index options. See Dense vector field type for the full list.
Step 2: Index documents
Add documents using the Bulk API or individual index requests. Each document's vector must match the dims value in the mapping:
PUT my_vector_index/_doc/1
{
"my_text": "text1",
"my_vector": [0.5, 10, 6]
}
PUT my_vector_index/_doc/2
{
"my_text": "text2",
"my_vector": [-0.5, 10, 10]
}
If a document's vector dimension does not match dims in the mapping, Elasticsearch rejects the document with an error.
Step 3: Run a kNN search
Approximate kNN (recommended)
Submit a kNN query by specifying a query vector and the number of results to return:
GET my_vector_index/_search
{
"knn": {
"field": "my_vector",
"query_vector": [-5, 9, -12],
"k": 10,
"num_candidates": 100
},
"fields": ["my_text"]
}
| Parameter | Description |
|---|---|
k |
Number of nearest neighbors to return. Must be less than or equal to num_candidates. Defaults to the size value. |
num_candidates |
Nearest neighbor candidates to collect per shard before merging results. Must be greater than k and less than or equal to 10,000. Defaults to Math.min(1.5 * k, 10000). A higher value improves recall at the cost of latency. |
In HNSW terms,num_candidatesmaps to theef(exploration factor) value at query time — it controls how many candidate documents each shard explores before returning its top results.kis the final number of documents returned across all shards.
Exact kNN
Use a script_score query with a vector function for brute-force scoring. This approach scans all documents in the index and does not scale to large datasets.
Additional kNN search capabilities
The knn clause supports the following features:
| Feature | Description |
|---|---|
| Filtering | Add a filter clause to restrict results to a subset of documents before running the kNN search. |
| Minimum score threshold | Use the similarity parameter to exclude documents below a minimum similarity score. |
| Nested fields | Run kNN searches on vectors stored in nested fields. |
| Multi-field kNN | Query multiple knn fields in a single request. |
| Rescoring | Rerank approximate kNN results with a script rescore for higher precision. |
For the complete feature reference, see k-nearest neighbor (kNN) search.
Production considerations
Memory planning
The vector engine stores HNSW indexes in off-heap memory. Underestimating memory causes evictions and degrades search latency. Size your data nodes before going to production using the memory calculation guidance in this topic.
`num_candidates` tuning
Start with the default value (Math.min(1.5 * k, 10000)). If recall is below your target, increase num_candidates incrementally. A larger value improves recall at greater performance cost.
Shard planning
Set number_of_shards and number_of_replicas based on your expected data volume, query throughput, and availability requirements.
Dimension consistency
Set dims to the exact output dimension of your embedding model. Dimension mismatches cause indexing errors.
Keep Elasticsearch up to date
The vector engine receives ongoing improvements in performance and cost efficiency. Run the latest Alibaba Cloud Elasticsearch 8.x version to benefit from these updates.