All Products
Search
Document Center

Elasticsearch:Use the aliyun-knn plug-in

Last Updated:Mar 26, 2026
This topic applies to existing Elasticsearch clusters running earlier versions. For new deployments, use clusters of V8.15 or later, which include native vector search capabilities.

The aliyun-knn plug-in is a vector search engine built by the Alibaba Cloud Elasticsearch team. It uses Proxima, a vector library designed by Alibaba DAMO Academy, to power k-nearest neighbors (kNN) search across scenarios such as image search, video fingerprinting, facial recognition, speech recognition, and product recommendation.

How it works

The plug-in adds a proxima_vector field type and two search algorithms to Elasticsearch:

  • Hierarchical Navigable Small World (HNSW): graph-based approximate kNN — fast, high-recall, in-memory. Best for low-latency queries on smaller datasets.

  • Linear Search: brute-force exact kNN — always 100% recall, but latency scales with data volume.

The plug-in is compatible with all versions of open source Elasticsearch. Both algorithms store indexes in memory and are compatible with standard Elasticsearch features: multiple replica shards, distributed search, real-time incremental synchronization, near-real-time (NRT) updates, and restoration.

Performance benchmark (tested on V6.7.0, two data nodes with 16 vCPUs and 64 GiB each, 20 million SIFT 128-dimensional float vectors):

MetricHNSWLinear Search
Top-10 recall ratio98.6%100%
Top-50 recall ratio97.9%100%
Top-100 recall ratio97.4%100%
Latency (p99)0.093 s0.934 s
Latency (p90)0.018 s0.305 s

p99 = time to respond to 99% of queries.

The plug-in is used in numerous production scenarios, including Pailitao, Image Search, Youku video fingerprinting, Qutoutiao video fingerprinting, Taobao commodity recommendation, customized searches, and Crossmedia searches.

Prerequisites

Before you begin, make sure that:

  • The aliyun-knn plug-in is installed and compatible with your cluster version (see the version compatibility table below).

  • Data node specifications are at least 16 vCPUs and 64 GiB of memory. The plug-in creates indexes during both refresh and flush operations, which are CPU- and memory-intensive. Undersized nodes cause bottlenecks or instability.

  • Your cluster has independent dedicated master nodes.

  • Off-heap memory is greater than twice the total size of your vector data. During force merge, both old and new data are in memory simultaneously — plan for four times the vector data size if you run force merges. Memory estimate formula: num_vectors × num_dimensions × bytes_per_element Example: a 960-dimensional float index with 400 documents uses 960 × 400 × 4 = 1,536,000 bytes (~1.5 MB). Off-heap memory must be greater than 3 MB (1.5 × 2). For clusters with 64 GiB or more of total memory, off-heap memory ≈ total memory − 32 GiB.

  • Write throughput stays below 5,000 TPS per data node (16 vCPUs, 64 GiB). Vector indexing is CPU-intensive; avoid high write concurrency while queries are running.

Choose an algorithm:

AlgorithmWhen to useIn-memoryNotes
HNSWLow-latency queries, high recall requiredYesBased on the greedy search algorithm. Inner product spaces don't obey triangle inequality — convert to Euclidean or spherical spaces before using HNSW with inner product similarity. Run force merge regularly during off-peak hours to reduce latency.
Linear SearchBrute-force required, 100% recall required, recall comparisonYesLatency increases linearly with data volume.

Version compatibility:

Cluster versionKernel versionPlug-in status
V6.7.0Earlier than V1.2.0Manual installation required via the Plug-ins page. Script query and index warm-up features are not available. distance_method is limited to SquaredEuclidean.
V6.8, V7.4, V7.7distance_method is limited to SquaredEuclidean.
V6.7.0V1.2.0 or laterIntegrated into the apack plug-in (installed by default). Script query, index warm-up, and extended functions available with kernel V1.3.0+. If you see a mapping parsing error when creating a vector index, upgrade the kernel to V1.3.0+ and retry.
V7.10.0V1.4.0 or laterIntegrated into the apack plug-in (installed by default).
Other versionsVector search is not supported.
The kernel version is different from the apack plug-in version. Run GET _cat/plugins?v to check the apack plug-in version.

Limitations

Create a vector index

Log on to the Kibana console of your Elasticsearch cluster. For details, see Log on to the Kibana console.

Creating a vector index involves three steps:

  1. Configure index settings — choose the codec and algorithm.

  2. Define a vector field in mappings — set the field type, dimensions, and distance function.

  3. Add documents to the index.

Step 1: Create the index

In Dev Tools > Console, run:

PUT test
{
  "settings": {
    "index.codec": "proxima",
    "index.vector.algorithm": "hnsw"
  },
  "mappings": {
    "_doc": {
      "properties": {
        "feature": {
          "type": "proxima_vector",
          "dim": 2,
          "vector_type": "float",
          "distance_method": "SquaredEuclidean"
        }
      }
    }
  }
}
The examples in this topic use an Elasticsearch V6.7.0 cluster. Operations may differ for other versions.

Settings parameters:

ParameterDefaultDescription
index.codecproximaSet to proxima to create a proxima vector index that supports HNSW and Linear Search queries. Set to null to create forward indexes only — in this case, only script queries are supported (available for V6.7.0 clusters with apack V1.2.1+ and V7.10.0 clusters with apack V1.4.0+).
index.vector.algorithmhnswThe search algorithm. Valid values: hnsw, linear.
index.vector.general.builder.offline_modefalse(Optional) Set to true to use offline optimization mode, which significantly reduces the number of segments written and improves write throughput. Use this mode when loading all data at once. Offline mode disables script queries. Available for V6.7.0 clusters with apack V1.2.1+ and V7.10.0 clusters with apack V1.4.0+.

Mappings parameters:

ParameterRequiredDefaultDescription
typeRequiredSet to proxima_vector to define a vector field.
dimRequiredNumber of vector dimensions. Valid values: 1–2048.
vector_typeOptionalfloatData type of vectors. Valid values: float, short, binary. For binary, represent data as an unsigned 32-bit decimal array (uint32) and set dim to a multiple of 32. All three types are supported on V6.7.0 clusters with apack V1.2.1+ and V7.10.0 clusters with apack V1.4.0+. Other clusters support only float.
distance_methodOptionalSquaredEuclideanDistance function for similarity scoring. Valid values: SquaredEuclidean, InnerProduct, Cosine, Hamming. Hamming is only available when vector_type is binary. All four values are supported on V6.7.0 clusters with apack V1.2.1+ and V7.10.0 clusters with apack V1.4.0+. V6.8, V7.4, and V7.7 clusters support only SquaredEuclidean. See Distance measurement functions for scoring formulas.
You can add other Elasticsearch field types alongside the vector field in the same mapping. Run GET /_cat/plugins?v to check your apack plug-in version. If the version doesn't meet the requirements, submit a ticketsubmit a ticketsubmit a ticketsubmit a ticket to have Alibaba Cloud engineers update it.

Step 2: Add a document

POST test/_doc
{
  "feature": [1.0, 2.0]
}

If vector_type is binary, the vector data must be a uint32 array and dim must be a multiple of 32. For all other types, the array length must equal dim.

Search for a vector

Standard search

GET test/_search
{
  "query": {
    "hnsw": {
      "feature": {
        "vector": [1.5, 2.5],
        "size": 10
      }
    }
  }
}
ParameterDescription
hnswThe algorithm name. Must match the value of index.vector.algorithm set when creating the index.
vectorThe query vector. The array length must match dim.
sizeThe number of documents recalled by the plug-in. Set this to the same value as the Elasticsearch-level size parameter (default: 10). The plug-in recalls the top N documents based on vector similarity; Elasticsearch then applies its own size filter and returns the intersection.

For advanced HNSW search tuning, see Advanced parameters.

Script query

Script queries let you score documents using a custom formula. All examples use the script_score parameter — X-Pack functions are not supported.

GET test/_search
{
  "query": {
    "match_all": {}
  },
  "rescore": {
    "query": {
      "rescore_query": {
        "function_score": {
          "functions": [{
            "script_score": {
              "script": {
                "source": "1/(1+l2Squared(params.queryVector, doc['feature']))",
                "params": {
                  "queryVector": [2.0, 2.0]
                }
              }
            }
          }]
        }
      }
    }
  }
}

Supported script functions:

FunctionDistance metric
l2Squared(float[] queryVector, DocValues docValues)Euclidean distance
hamming(float[] queryVector, DocValues docValues)Hamming distance
cosineSimilarity(float[] queryVector, DocValues docValues)Cosine similarity (use for V6.7 clusters)
cosine(float[] queryVector, DocValues docValues)Cosine similarity (use for V7.10 clusters)
  • float[] queryVector: the query vector (formal or actual parameter)

  • DocValues docValues: the stored document vectors

Script queries are available on V6.7.0 clusters with apack V1.2.1+ and V7.10.0 clusters with apack V1.4.0+. They are not available for indexes created with offline optimization mode.

Index warm-up

Vector indexes are stored in memory. The first search after a write has high latency because the index must load into memory first. Use the warm-up API to load indexes before serving queries.

Warm up all vector indexes:

POST _vector/warmup

Warm up a specific index:

POST _vector/{indexName}/warmup

If you have many vector indexes but only need fast responses from one, warm up only that index.

Index warm-up is available on V6.7.0 clusters with apack V1.2.1+ and V7.10.0 clusters with apack V1.4.0+.

Distance measurement functions

All distance functions use the same scoring formula:

Score = 1 / (distance + 1)

By default, SquaredEuclidean (Euclidean distance without square root extraction) is used.

Distance functionDescriptionScoring formulaBest forExample
SquaredEuclideanEuclidean distance between vectors — measures absolute spatial separation.distance = (A1−B1)² + … + (An−Bn)²; score = 1/(distance+1)Analyzing absolute differences across dimensions, such as user behavior metrics[0,0] vs [1,2] → distance = 5
CosineCosine of the angle between vectors — measures orientation regardless of magnitude.distance = cosine angle; score = 1/(cosine+1)Content similarity scoring where magnitude varies (e.g., text or interest vectors)[1,1] vs [1,0] → cosine = 0.707
InnerProductDot product of two vectors — combines angle and magnitude. Equivalent to cosine similarity after normalization.distance = A1B1 + … + AnBn; score = 1/(distance+1)Recommendation systems where both orientation and magnitude matter[1,1] vs [1,5] → inner product = 6
Hamming (binary only)Minimum number of bit positions that differ between two binary strings of equal length.d(x,y) = ∑ x[i] ⊕ y[i]; score = 1/(distance+1)Error detection and binary fingerprint comparison1011101 vs 1001001 → distance = 2
All four distance functions are supported on V6.7.0 clusters with apack V1.2.1+ and V7.10.0 clusters with apack V1.4.0+. All other clusters support only SquaredEuclidean. Run GET _cat/plugins?v to check your version. The Hamming function uses a special implementation: when HNSW or Linear Search is the algorithm, standard kNN searches are not supported for Hamming indexes — only script queries are supported, compatible with the script_score parameter.

Circuit breaker parameters

These cluster-level parameters protect your cluster from memory exhaustion during vector indexing. Check current values with GET _cluster/settings. Keep the default values unless you have a specific reason to change them.

ParameterDefaultDescription
indices.breaker.vector.native.indexing.limit70%Write operations are suspended when off-heap memory usage exceeds this threshold. Writes resume automatically after the system finishes creating indexes and releases memory. If the circuit breaker trips frequently, reduce write throughput.
indices.breaker.vector.native.total.limit80%Maximum proportion of off-heap memory used for vector indexes. Exceeding this limit may trigger shard reallocation.

Advanced parameters

HNSW index creation parameters

Set these in the settings block when index.vector.algorithm is hnsw.

ParameterDefaultDescription
index.vector.hnsw.builder.max_scan_num100000Maximum number of nearest neighbors scanned during graph construction in the worst case.
index.vector.hnsw.builder.neighbor_cnt100Maximum nearest neighbors per node at layer 0. Higher values improve graph quality but increase storage.
index.vector.hnsw.builder.upper_neighbor_cnt50Maximum nearest neighbors per node at layers above layer 0. Set to 50% of neighbor_cnt. Maximum value: 255.
index.vector.hnsw.builder.efconstruction400Nearest neighbors scanned during graph construction. Higher values improve quality but increase indexing time.
index.vector.hnsw.builder.max_level6Total number of layers, including layer 0. For 10 million documents with scaling_factor of 30, the result is ceil(log₃₀(10,000,000)) = 5. This parameter has minimal impact on search quality.
index.vector.hnsw.builder.scaling_factor50Data volume per layer equals the layer above multiplied by this factor. Valid values: 10–100. Higher values mean fewer layers.

HNSW search parameters

ParameterDefaultValid valuesDescription
ef100100–1000Number of nearest neighbors scanned during an online search. Higher values improve recall but slow down queries.

Example search with ef tuning:

GET test/_search
{
  "query": {
    "hnsw": {
      "feature": {
        "vector": [1.5, 2.5],
        "size": 10,
        "ef": 100
      }
    }
  }
}

FAQ

How do I evaluate the recall ratio of my HNSW index?

Create two indexes with identical settings — one using HNSW and one using Linear Search. Write the same vector data to both, then run the same query vector against each. Compare the returned document IDs: recall ratio = (IDs returned by both indexes) / (total IDs returned by Linear Search).

My writes returned a `circuitBreakingException` error.

Off-heap memory usage exceeded the indices.breaker.vector.native.indexing.limit threshold (default 70%), and writes were paused. In most cases, writes resume automatically once the system finishes building indexes and frees memory. Add a retry mechanism to your write script to handle this automatically.

Why is CPU usage still high after writes are paused?

The system continues building vector indexes during refresh and flush operations even after the write pause. CPU usage drops after the final refresh completes.

I see a `class_cast_exception: class org.apache.lucene.index.SoftDeletesDirectoryReaderWrapper$SoftDeletesFilterCodecReader cannot be cast to class org.apache.lucene.index.SegmentReader` error.

Disable the physical replication feature on your index. See Use the physical replication feature of the apack plug-in.

Vector searches are slow or suspended due to high memory usage.

Vector indexes are loaded into data node memory during a search. If a data node stores more vector index data than half its total memory, searches slow down or stall. Keep each data node's vector data volume below 50% of its total memory. If node memory is insufficient, upgrade the data nodes. See Upgrade the configuration of a cluster.

Are best practices and real-world examples available?

The Alibaba Cloud developer community provides business scenarios and a best practice guide for the aliyun-knn plug-in.

I cannot use `must_not exists` to filter documents where the vector field is empty.

Vector data is stored differently from regular fields and may not be compatible with standard Query DSL filters. Use the following script query instead:

GET jx-similar-product-v1/_search
{
  "query": {
    "bool": {
      "must": {
        "script": {
          "script": {
            "source": "doc['feature'].empty",
            "lang": "painless"
          }
        }
      }
    }
  }
}