This topic applies to existing Elasticsearch clusters running earlier versions. For new deployments, use clusters of V8.15 or later, which include native vector search capabilities.
The aliyun-knn plug-in is a vector search engine built by the Alibaba Cloud Elasticsearch team. It uses Proxima, a vector library designed by Alibaba DAMO Academy, to power k-nearest neighbors (kNN) search across scenarios such as image search, video fingerprinting, facial recognition, speech recognition, and product recommendation.
How it works
The plug-in adds a proxima_vector field type and two search algorithms to Elasticsearch:
Hierarchical Navigable Small World (HNSW): graph-based approximate kNN — fast, high-recall, in-memory. Best for low-latency queries on smaller datasets.
Linear Search: brute-force exact kNN — always 100% recall, but latency scales with data volume.
The plug-in is compatible with all versions of open source Elasticsearch. Both algorithms store indexes in memory and are compatible with standard Elasticsearch features: multiple replica shards, distributed search, real-time incremental synchronization, near-real-time (NRT) updates, and restoration.
Performance benchmark (tested on V6.7.0, two data nodes with 16 vCPUs and 64 GiB each, 20 million SIFT 128-dimensional float vectors):
| Metric | HNSW | Linear Search |
|---|---|---|
| Top-10 recall ratio | 98.6% | 100% |
| Top-50 recall ratio | 97.9% | 100% |
| Top-100 recall ratio | 97.4% | 100% |
| Latency (p99) | 0.093 s | 0.934 s |
| Latency (p90) | 0.018 s | 0.305 s |
p99 = time to respond to 99% of queries.
The plug-in is used in numerous production scenarios, including Pailitao, Image Search, Youku video fingerprinting, Qutoutiao video fingerprinting, Taobao commodity recommendation, customized searches, and Crossmedia searches.
Prerequisites
Before you begin, make sure that:
The aliyun-knn plug-in is installed and compatible with your cluster version (see the version compatibility table below).
Data node specifications are at least 16 vCPUs and 64 GiB of memory. The plug-in creates indexes during both refresh and flush operations, which are CPU- and memory-intensive. Undersized nodes cause bottlenecks or instability.
Your cluster has independent dedicated master nodes.
Off-heap memory is greater than twice the total size of your vector data. During force merge, both old and new data are in memory simultaneously — plan for four times the vector data size if you run force merges. Memory estimate formula:
num_vectors × num_dimensions × bytes_per_elementExample: a 960-dimensional float index with 400 documents uses960 × 400 × 4 = 1,536,000 bytes (~1.5 MB). Off-heap memory must be greater than 3 MB (1.5 × 2). For clusters with 64 GiB or more of total memory, off-heap memory ≈ total memory − 32 GiB.Write throughput stays below 5,000 TPS per data node (16 vCPUs, 64 GiB). Vector indexing is CPU-intensive; avoid high write concurrency while queries are running.
Choose an algorithm:
| Algorithm | When to use | In-memory | Notes |
|---|---|---|---|
| HNSW | Low-latency queries, high recall required | Yes | Based on the greedy search algorithm. Inner product spaces don't obey triangle inequality — convert to Euclidean or spherical spaces before using HNSW with inner product similarity. Run force merge regularly during off-peak hours to reduce latency. |
| Linear Search | Brute-force required, 100% recall required, recall comparison | Yes | Latency increases linearly with data volume. |
Version compatibility:
| Cluster version | Kernel version | Plug-in status |
|---|---|---|
| V6.7.0 | Earlier than V1.2.0 | Manual installation required via the Plug-ins page. Script query and index warm-up features are not available. distance_method is limited to SquaredEuclidean. |
| V6.8, V7.4, V7.7 | — | distance_method is limited to SquaredEuclidean. |
| V6.7.0 | V1.2.0 or later | Integrated into the apack plug-in (installed by default). Script query, index warm-up, and extended functions available with kernel V1.3.0+. If you see a mapping parsing error when creating a vector index, upgrade the kernel to V1.3.0+ and retry. |
| V7.10.0 | V1.4.0 or later | Integrated into the apack plug-in (installed by default). |
| Other versions | — | Vector search is not supported. |
The kernel version is different from the apack plug-in version. Run GET _cat/plugins?v to check the apack plug-in version.Limitations
Data node specifications must be at least 16 vCPUs and 64 GiB before installing the plug-in. To upgrade, see Upgrade the configuration of a cluster.
If the physical replication feature is enabled on your cluster, disable it before using the aliyun-knn plug-in. See Use the physical replication feature of the apack plug-in.
Data migration via Object Storage Service (OSS) snapshots or DataWorks is not supported. Use Logstash to migrate data.
Create a vector index
Log on to the Kibana console of your Elasticsearch cluster. For details, see Log on to the Kibana console.
Creating a vector index involves three steps:
Configure index settings — choose the codec and algorithm.
Define a vector field in mappings — set the field type, dimensions, and distance function.
Add documents to the index.
Step 1: Create the index
In Dev Tools > Console, run:
PUT test
{
"settings": {
"index.codec": "proxima",
"index.vector.algorithm": "hnsw"
},
"mappings": {
"_doc": {
"properties": {
"feature": {
"type": "proxima_vector",
"dim": 2,
"vector_type": "float",
"distance_method": "SquaredEuclidean"
}
}
}
}
}The examples in this topic use an Elasticsearch V6.7.0 cluster. Operations may differ for other versions.
Settings parameters:
| Parameter | Default | Description |
|---|---|---|
index.codec | proxima | Set to proxima to create a proxima vector index that supports HNSW and Linear Search queries. Set to null to create forward indexes only — in this case, only script queries are supported (available for V6.7.0 clusters with apack V1.2.1+ and V7.10.0 clusters with apack V1.4.0+). |
index.vector.algorithm | hnsw | The search algorithm. Valid values: hnsw, linear. |
index.vector.general.builder.offline_mode | false | (Optional) Set to true to use offline optimization mode, which significantly reduces the number of segments written and improves write throughput. Use this mode when loading all data at once. Offline mode disables script queries. Available for V6.7.0 clusters with apack V1.2.1+ and V7.10.0 clusters with apack V1.4.0+. |
Mappings parameters:
| Parameter | Required | Default | Description |
|---|---|---|---|
type | Required | — | Set to proxima_vector to define a vector field. |
dim | Required | — | Number of vector dimensions. Valid values: 1–2048. |
vector_type | Optional | float | Data type of vectors. Valid values: float, short, binary. For binary, represent data as an unsigned 32-bit decimal array (uint32) and set dim to a multiple of 32. All three types are supported on V6.7.0 clusters with apack V1.2.1+ and V7.10.0 clusters with apack V1.4.0+. Other clusters support only float. |
distance_method | Optional | SquaredEuclidean | Distance function for similarity scoring. Valid values: SquaredEuclidean, InnerProduct, Cosine, Hamming. Hamming is only available when vector_type is binary. All four values are supported on V6.7.0 clusters with apack V1.2.1+ and V7.10.0 clusters with apack V1.4.0+. V6.8, V7.4, and V7.7 clusters support only SquaredEuclidean. See Distance measurement functions for scoring formulas. |
You can add other Elasticsearch field types alongside the vector field in the same mapping. Run GET /_cat/plugins?v to check your apack plug-in version. If the version doesn't meet the requirements, submit a ticketsubmit a ticketsubmit a ticketsubmit a ticket to have Alibaba Cloud engineers update it.Step 2: Add a document
POST test/_doc
{
"feature": [1.0, 2.0]
}If vector_type is binary, the vector data must be a uint32 array and dim must be a multiple of 32. For all other types, the array length must equal dim.
Search for a vector
Standard search
GET test/_search
{
"query": {
"hnsw": {
"feature": {
"vector": [1.5, 2.5],
"size": 10
}
}
}
}| Parameter | Description |
|---|---|
hnsw | The algorithm name. Must match the value of index.vector.algorithm set when creating the index. |
vector | The query vector. The array length must match dim. |
size | The number of documents recalled by the plug-in. Set this to the same value as the Elasticsearch-level size parameter (default: 10). The plug-in recalls the top N documents based on vector similarity; Elasticsearch then applies its own size filter and returns the intersection. |
For advanced HNSW search tuning, see Advanced parameters.
Script query
Script queries let you score documents using a custom formula. All examples use the script_score parameter — X-Pack functions are not supported.
GET test/_search
{
"query": {
"match_all": {}
},
"rescore": {
"query": {
"rescore_query": {
"function_score": {
"functions": [{
"script_score": {
"script": {
"source": "1/(1+l2Squared(params.queryVector, doc['feature']))",
"params": {
"queryVector": [2.0, 2.0]
}
}
}
}]
}
}
}
}
}Supported script functions:
| Function | Distance metric |
|---|---|
l2Squared(float[] queryVector, DocValues docValues) | Euclidean distance |
hamming(float[] queryVector, DocValues docValues) | Hamming distance |
cosineSimilarity(float[] queryVector, DocValues docValues) | Cosine similarity (use for V6.7 clusters) |
cosine(float[] queryVector, DocValues docValues) | Cosine similarity (use for V7.10 clusters) |
float[] queryVector: the query vector (formal or actual parameter)DocValues docValues: the stored document vectors
Script queries are available on V6.7.0 clusters with apack V1.2.1+ and V7.10.0 clusters with apack V1.4.0+. They are not available for indexes created with offline optimization mode.
Index warm-up
Vector indexes are stored in memory. The first search after a write has high latency because the index must load into memory first. Use the warm-up API to load indexes before serving queries.
Warm up all vector indexes:
POST _vector/warmupWarm up a specific index:
POST _vector/{indexName}/warmupIf you have many vector indexes but only need fast responses from one, warm up only that index.
Index warm-up is available on V6.7.0 clusters with apack V1.2.1+ and V7.10.0 clusters with apack V1.4.0+.
Distance measurement functions
All distance functions use the same scoring formula:
Score = 1 / (distance + 1)By default, SquaredEuclidean (Euclidean distance without square root extraction) is used.
| Distance function | Description | Scoring formula | Best for | Example |
|---|---|---|---|---|
SquaredEuclidean | Euclidean distance between vectors — measures absolute spatial separation. | distance = (A1−B1)² + … + (An−Bn)²; score = 1/(distance+1) | Analyzing absolute differences across dimensions, such as user behavior metrics | [0,0] vs [1,2] → distance = 5 |
Cosine | Cosine of the angle between vectors — measures orientation regardless of magnitude. | distance = cosine angle; score = 1/(cosine+1) | Content similarity scoring where magnitude varies (e.g., text or interest vectors) | [1,1] vs [1,0] → cosine = 0.707 |
InnerProduct | Dot product of two vectors — combines angle and magnitude. Equivalent to cosine similarity after normalization. | distance = A1B1 + … + AnBn; score = 1/(distance+1) | Recommendation systems where both orientation and magnitude matter | [1,1] vs [1,5] → inner product = 6 |
Hamming (binary only) | Minimum number of bit positions that differ between two binary strings of equal length. | d(x,y) = ∑ x[i] ⊕ y[i]; score = 1/(distance+1) | Error detection and binary fingerprint comparison | 1011101 vs 1001001 → distance = 2 |
All four distance functions are supported on V6.7.0 clusters with apack V1.2.1+ and V7.10.0 clusters with apack V1.4.0+. All other clusters support onlySquaredEuclidean. RunGET _cat/plugins?vto check your version. The Hamming function uses a special implementation: when HNSW or Linear Search is the algorithm, standard kNN searches are not supported for Hamming indexes — only script queries are supported, compatible with thescript_scoreparameter.
Circuit breaker parameters
These cluster-level parameters protect your cluster from memory exhaustion during vector indexing. Check current values with GET _cluster/settings. Keep the default values unless you have a specific reason to change them.
| Parameter | Default | Description |
|---|---|---|
indices.breaker.vector.native.indexing.limit | 70% | Write operations are suspended when off-heap memory usage exceeds this threshold. Writes resume automatically after the system finishes creating indexes and releases memory. If the circuit breaker trips frequently, reduce write throughput. |
indices.breaker.vector.native.total.limit | 80% | Maximum proportion of off-heap memory used for vector indexes. Exceeding this limit may trigger shard reallocation. |
Advanced parameters
HNSW index creation parameters
Set these in the settings block when index.vector.algorithm is hnsw.
| Parameter | Default | Description |
|---|---|---|
index.vector.hnsw.builder.max_scan_num | 100000 | Maximum number of nearest neighbors scanned during graph construction in the worst case. |
index.vector.hnsw.builder.neighbor_cnt | 100 | Maximum nearest neighbors per node at layer 0. Higher values improve graph quality but increase storage. |
index.vector.hnsw.builder.upper_neighbor_cnt | 50 | Maximum nearest neighbors per node at layers above layer 0. Set to 50% of neighbor_cnt. Maximum value: 255. |
index.vector.hnsw.builder.efconstruction | 400 | Nearest neighbors scanned during graph construction. Higher values improve quality but increase indexing time. |
index.vector.hnsw.builder.max_level | 6 | Total number of layers, including layer 0. For 10 million documents with scaling_factor of 30, the result is ceil(log₃₀(10,000,000)) = 5. This parameter has minimal impact on search quality. |
index.vector.hnsw.builder.scaling_factor | 50 | Data volume per layer equals the layer above multiplied by this factor. Valid values: 10–100. Higher values mean fewer layers. |
HNSW search parameters
| Parameter | Default | Valid values | Description |
|---|---|---|---|
ef | 100 | 100–1000 | Number of nearest neighbors scanned during an online search. Higher values improve recall but slow down queries. |
Example search with ef tuning:
GET test/_search
{
"query": {
"hnsw": {
"feature": {
"vector": [1.5, 2.5],
"size": 10,
"ef": 100
}
}
}
}FAQ
How do I evaluate the recall ratio of my HNSW index?
Create two indexes with identical settings — one using HNSW and one using Linear Search. Write the same vector data to both, then run the same query vector against each. Compare the returned document IDs: recall ratio = (IDs returned by both indexes) / (total IDs returned by Linear Search).
My writes returned a `circuitBreakingException` error.
Off-heap memory usage exceeded the indices.breaker.vector.native.indexing.limit threshold (default 70%), and writes were paused. In most cases, writes resume automatically once the system finishes building indexes and frees memory. Add a retry mechanism to your write script to handle this automatically.
Why is CPU usage still high after writes are paused?
The system continues building vector indexes during refresh and flush operations even after the write pause. CPU usage drops after the final refresh completes.
I see a `class_cast_exception: class org.apache.lucene.index.SoftDeletesDirectoryReaderWrapper$SoftDeletesFilterCodecReader cannot be cast to class org.apache.lucene.index.SegmentReader` error.
Disable the physical replication feature on your index. See Use the physical replication feature of the apack plug-in.
Vector searches are slow or suspended due to high memory usage.
Vector indexes are loaded into data node memory during a search. If a data node stores more vector index data than half its total memory, searches slow down or stall. Keep each data node's vector data volume below 50% of its total memory. If node memory is insufficient, upgrade the data nodes. See Upgrade the configuration of a cluster.
Are best practices and real-world examples available?
The Alibaba Cloud developer community provides business scenarios and a best practice guide for the aliyun-knn plug-in.
I cannot use `must_not exists` to filter documents where the vector field is empty.
Vector data is stored differently from regular fields and may not be compatible with standard Query DSL filters. Use the following script query instead:
GET jx-similar-product-v1/_search
{
"query": {
"bool": {
"must": {
"script": {
"script": {
"source": "doc['feature'].empty",
"lang": "painless"
}
}
}
}
}
}