All Products
Search
Document Center

Elasticsearch:Guide to the Alibaba Cloud Elasticsearch vector engine

Last Updated:Sep 23, 2025

The Alibaba Cloud Elasticsearch vector engine is designed to process large volumes of vector data by combining the powerful search capabilities of Elasticsearch with vector similarity calculations. This makes it ideal for scenarios such as recommendation systems, image retrieval, and natural language processing. This topic describes how to use the vector engine and provides best practices. The engine is continuously updated. For the best performance, cost-effectiveness, and user experience, use the latest version of Alibaba Cloud Elasticsearch.

Prerequisites

You must create an ES instance. If you do not have an instance, see Basic edition: From instance creation to data retrieval. Ensure that you create the latest version of an Alibaba Cloud ES 8.x instance.

Note

The vector engine requires a large amount of off-heap memory to cache vector indexes. When you select specifications, evaluate the required off-heap memory based on the calculation instructions in this topic. This helps you select the appropriate specifications and number of data nodes.

Procedure

1. Create an index

First, create an index suitable for storing vector data. The following code provides an example of an index definition:

PUT /my_vector_index
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "dims": 3
      },
      "my_text" : {
        "type" : "keyword"
      }
    }
  }
}
Important
  • The values for number_of_shards and number_of_replicas depend on your data scale and performance requirements.

  • The dense_vector type is used to store vector data. The dims parameter specifies the vector dimensions.

  • For more information about the parameters for the dense_vector type, see Dense vector field type.

2. Import data

You can import data into the Elasticsearch vector index using various methods. For example, you can use the Bulk API for batch data import. The following code provides an example:

PUT my_vector_index/_doc/1
{
  "my_text" : "text1",
  "my_vector" : [0.5, 10, 6]
}

PUT my_vector_index/_doc/2
{
  "my_text" : "text2",
  "my_vector" : [-0.5, 10, 10]
}
Important

Ensure that the dimensions of the vector data match the dimensions defined in the index.

3. Search for vectors

You can use the Elasticsearch vector similarity search feature to find the documents that are most similar to a query vector. The following code provides an example:

GET my_vector_index/_search
{
  "knn": {
    "field": "my_vector",
    "query_vector": [-5, 9, -12],
    "k": 10,
    "num_candidates": 100
  },
  "fields": [ "my_text" ]
}

Parameter

Description

k

(Optional) The number of nearest neighbors to return. This value must be less than or equal to num_candidates. By default, this value is the same as `size`.

num_candidates

(Optional) The number of nearest neighbor candidates to consider on each shard. This parameter greatly affects performance and recall rate. A larger num_candidates value increases the recall rate but also has a greater impact on performance. The value must be greater than k (or `size` if k is omitted) and cannot exceed 10,000. Elasticsearch collects num_candidates results from each shard and merges them to find the top k results. Increasing num_candidates usually improves the accuracy of the final k results. The default value is `Math.min(1.5 * k, 10000)`.

Note

Regarding the k and num_candidates parameters: In the Hierarchical Navigable Small World (HNSW) algorithm, num_candidates corresponds to the `ef` value for a query, which specifies the collection of num_candidates nearest documents from a shard, and k is the number of documents that Elasticsearch returns in the result.

Other vector search features include the following:

  • knn supports filter queries. It also supports similarity, lets you set a minimum score for hit documents, and supports nested fields.

  • You can query multiple knn fields at the same time.

  • You can use a script to perform exact knn queries.

  • You can use a script for rescore.

  • For all features, see k-nearest neighbor (k-NN) search.

Performance optimization

Elasticsearch uses the Hierarchical Navigable Small World (HNSW) algorithm for approximate k-NN (knn) search. HNSW is a graph-based algorithm that performs best when most vector data is in memory. Ensure your data nodes have sufficient off-heap memory to store the vector data and the index schema. When you use the vector engine, consider the following optimizations:

Set appropriate parameters

You can set appropriate values for the m and ef_construction parameters. These are advanced parameters for the dense_vector type that you can configure when you create an index. For more information, see Dense vector field type.

Note

HNSW is an approximate k-NN search method and cannot guarantee returning the absolute nearest neighbors with 100% accuracy. The main parameters that affect the recall rate are m and ef_construction.

Parameter

Description

m

The number of neighbors for each node. The default is 16. A larger value increases the recall rate but hurts performance and increases memory usage. If you have strict recall requirements, set this to 64 or higher.

ef_construction

The number of candidates to consider when adding a node to the HNSW graph. The default is 100. A larger ef_construction value increases the recall rate. It also has a significant impact on performance but does not affect memory usage. If you have strict recall requirements, set this to 512 or higher.

Reduce memory consumption

Elasticsearch uses quantization to reduce memory usage. Quantization can reduce the memory size of vectors by a factor of 4, 8, or even 32. For example, a value in a default float vector uses 4 bytes. With int8 quantization, each value uses only 1 byte. With int4 quantization, each value uses half a byte. With Better Binary Quantization (BBQ), each value uses only 1 bit, which means 8 values use 1 byte. Compared to unquantized vectors, BBQ reduces memory requirements to 1/32 of the original size.

To calculate the memory required for vector data:

You need to consider the memory required for both the vector data and the HNSW graph index. The graph index uses a small amount of memory with int8 quantization or no quantization. However, with bbq quantization, the memory used by the graph index increases significantly. When you calculate the memory for vector data, you must account for the memory usage of the graph index.

To calculate vector data memory:

  • element_type: floatnum_vectors × num_dimensions × 4

  • element_type: float with quantization: int8num_vectors × (num_dimensions + 4)

  • element_type: float with quantization: int4num_vectors × (num_dimensions/2 + 4)

  • element_type: float with quantization: bbqnum_vectors × (num_dimensions/8 + 12)

  • element_type: bytenum_vectors × num_dimensions

  • element_type: bitnum_vectors × (num_dimensions/8)

If you use the flat type and do not create an HNSW index, the vector data memory is calculated as shown in the preceding list. If you use the HNSW type, you must also calculate the size of the graph index. You can estimate the graph index size as follows:

num_vectors × 4 × HNSW.m. The default value of HNSW.m is 16, so the default size is num_vectors × 4 × 16.

The total memory required for vector data is the sum of these two parts.

You must also consider the number_of_replicas setting. The preceding calculation determines the memory size for one copy of the data (one primary shard). The total memory is the size of one copy multiplied by (1 + the number of replicas). The default value for number_of_replicas is 1. This means you have one primary shard and one replica shard, so the total memory required is twice the amount calculated for a single copy.

Note

When you enable quantization, the index size on disk increases. This is because Elasticsearch stores both the original vectors and the new quantized vectors. The size increase is derived from the first part of the vector data memory calculation. For example, if you apply int8 quantization to 40 GB of floating-point vectors, an additional 10 GB is required to store the quantized vectors. The total disk usage becomes 50 GB, but the memory used for fast search drops to 10 GB.

Check if off-heap memory is sufficient

When you calculate memory capacity and check whether the node memory is sufficient, focus on the off-heap memory of the nodes.

To obtain the amount of off-heap memory, note that a node must reserve sufficient memory for the Java heap. On a node with 64 GB of memory or less, the off-heap memory is typically half of the total memory. On a node with more than 64 GB of memory, the default off-heap memory is the total node memory minus 31 GB. To calculate the exact amount, you can run the following command:

GET _nodes/stats?human&filter_path=**.os.mem.total,**.jvm.mem.heap_max

A node's off-heap memory capacity is calculated as follows: os.mem.total - jvm.mem.heap_max.

Calculate vector index memory

Example:

Assume that you have 10 million 1,024-dimension vectors. You use the default vector settings, enable int8 quantization, set m=16, and use the default number_of_replicas value of 1. The total memory required for the vector data is calculated as follows:

2 × (10,000,000 × (1,024 + 4) + 10,000,000 × 4 × 16) = 20.34 GB.

If you use two data nodes with 16 GB of memory each to store this index, the total available off-heap memory is (16 GB / 2) × 2 = 16 GB. This amount is not sufficient to store the vector data.

If you use two data nodes with 32 GB of memory each, the total available off-heap memory is (32 GB / 2) × 2 = 32 GB. This amount is sufficient to store the vector data.

In a production environment, you must reserve some off-heap memory for other indexes, source documents, and network traffic from read and write operations. Insufficient off-heap memory often causes the disk `ioutil` metric to run at full capacity with high random read traffic.

Prefetch the file system cache

If the machine that runs Elasticsearch restarts, the file system cache is cleared. The operating system then needs time to load the hot areas of the index into memory to accelerate search operations. You can use the index.store.preload setting to explicitly instruct the operating system to immediately load specific files into memory based on their file name extensions.

Note

If the file system cache is not large enough to hold all the data, eagerly loading data for too many indexes or files can slow down searches. Use this setting with caution.

Example:

PUT /my_vector_index
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1,
    "index.store.preload": ["vex", "veq"]
  },
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "dims": 3
      },
      "my_text" : {
        "type" : "keyword"
      }
    }
  }
}

To determine which index files to load using index.store.preload, see the following descriptions of file name extensions:

The following file name extensions apply to approximate k-NN search. Each extension is subdivided based on the quantization type.

  • vex files store the HNSW graph structure.

  • vec files contain all non-quantized vector values. This includes all element types, such as floating-point numbers, bytes, and bits.

  • veq files are for quantized vectors from indexes that use int4 or int8 quantization.

  • veb files are for binary vectors from indexes that use bbq quantization.

  • vem, vemf, vemq, and vemb are for metadata. They are usually small and do not need to be preloaded.

Typically, when you use a quantized index, you should preload only the relevant quantized values and the HNSW graph. Preloading the original vectors is not necessary and can be counterproductive.

Configuration examples:

hnsw: "index.store.preload": ["vex", "vec"]

int8, int4: "index.store.preload": ["vex", "veq"]

bbq: "index.store.preload": ["vex", "veb"]

For an existing index, index.store.preload is a static parameter and cannot be changed after the index is created. If you can accept the index being temporarily unavailable, you can set the parameter by closing the index, applying the setting, and then reopening the index. The following example shows how to perform this operation:

POST my_vector_index/_close

PUT my_vector_index/_settings
{
  "index.store.preload": ["vex", "veq"]
}

POST my_vector_index/_open

Reduce the number of index segments

Elasticsearch shards are composed of segments (segment), which are internal storage elements in the index. For an approximate k-NN search, Elasticsearch stores the vector values of each segment as a separate HNSW graph. Therefore, a k-NN search must check each segment. Recent k-NN search parallelization makes searching across multiple segments faster. However, the speed of a k-NN search can still be improved several times over if there are fewer segments. By default, Elasticsearch periodically merges smaller segments into larger ones through a background process. If this is not sufficient, you can take the following explicit steps to reduce the number of index segments.

1. Increase the maximum segment size

Elasticsearch provides many settings to control the merge process. An important setting is index.merge.policy.max_merged_segment, which controls the maximum size of a segment created during a merge. Increasing this value can reduce the number of segments. The default value is 5 GB, which might be too small for vectors with large dimensions. You can consider increasing this value to 10 GB or 20 GB to help reduce the segment count. The following is an example:

PUT my_vector_index/_settings
{
  "index.merge.policy.max_merged_segment": "10gb"
}

2. Create large segments during bulk indexing

A common workflow is to perform an initial bulk upload and then make the index searchable. You can adjust index settings to encourage Elasticsearch to create larger initial segments instead of forcing merges. During the bulk upload, you can disable search and the refresh interval by setting index.refresh_interval to -1. This prevents refresh operations and the creation of extra segments. You can also configure a large index buffer for Elasticsearch so that it can accept more documents before a refresh. By default, indices.memory.index_buffer_size is 10% of the heap size. For large heap sizes, such as 32 GB, this is usually sufficient. To use the full index buffer, you must also increase the index.translog.flush_threshold_size limit.

Exclude vector fields from _source

Elasticsearch stores the original JSON document that was indexed in the _source field. By default, every hit in a search result includes the full _source document. When documents contain high-dimension dense vector fields, the _source can be very large and expensive to load. This can significantly slow down k-NN searches.

Note

Operations such as reindexupdate, and update by query require the _source field. Excluding fields from _source can cause these operations to behave unexpectedly. For example, when you reindex, the dense_vector field might not be included in the new index.

You can use the excludes mapping parameter to exclude dense vector fields from being stored in _source. This prevents large amounts of raw vector data from being loaded and returned during a search and reduces the index size. Vectors excluded from _source can still be used in k-NN searches because that process uses an independent data structure. Before you use the excludes parameter, review the potential disadvantages of excluding fields from _source, as described in the preceding note.

PUT /my_vector_index
{
  "mappings": {
    "_source": {
      "excludes": [
        "my_vector"
      ]
    },
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "dims": 3
      },
      "my_text": {
        "type": "keyword"
      }
    }
  }
}

To view the vector content in a document on Elasticsearch 8.17 or later, you can use the following command:

GET my_vector_index/_search
{
  "docvalue_fields": ["my_vector"]
}

Other versions are also available:

GET my_vector_index/_search
{
  "script_fields": {
    "vector_field": {
      "script": {
        "source" : "doc['my_vector'].vectorValue"
      }
    }
  }
}

As an alternative to excluding vector fields from _source, see synthetic _source.

Upgrade the instance type configuration

Vector similarity calculation is a compute-intensive task that requires high CPU performance. To obtain more than double the performance, select a Turbo instance type. You can use a blue-green deployment to change to a Turbo instance type of the same specification.