All Products
Search
Document Center

Elasticsearch:Alibaba Cloud Elasticsearch vector engine usage guide

Last Updated:Mar 25, 2025

The Alibaba Cloud Elasticsearch vector engine is designed for processing large-scale vector data, combining Elasticsearch's robust search capabilities with vector similarity computing power. It is ideal for applications such as recommendation systems, image retrieval, and natural language processing. This guide will show you how to effectively utilize the Alibaba Cloud Elasticsearch vector engine and offer best practices for optimal performance, cost-efficiency, and user experience. It is advisable to use the latest version of Alibaba Cloud Elasticsearch for continuous improvements.

Prerequisites

You must have an ES instance created. If you have not yet created one, see Quick Start to create the latest version of Alibaba Cloud ES 8.x.

Note
  • Instance type: It is recommended to use the turbo type to enhance the vector engine's performance.

  • Specifications: The vector engine requires significant off-heap memory to cache vector indexes. Select data node specifications and quantity based on off-heap memory usage, which can be estimated using the memory calculation instructions provided below.

Operations

1. Create an index

The first step is to create an index suitable for storing vector data. Below is an example index definition:

PUT /my_vector_index
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "dims": 3
      },
      "my_text" : {
        "type" : "keyword"
      }
    }
  }
}
Important
  • The number_of_shards and number_of_replicas settings should be based on your data scale and performance needs.

  • Use the dense_vector type to store vector data, with the dims parameter defining the vector dimensions.

  • For more information on the dense_vector parameters, see the Dense Vector Field Type documentation.

2. Data import

You can import data into the Elasticsearch vector index through various methods, such as utilizing the Bulk API for batch data import. Here is an example:

PUT my_vector_index/_doc/1
{
  "my_text" : "text1",
  "my_vector" : [0.5, 10, 6]
}

PUT my_vector_index/_doc/2
{
  "my_text" : "text2",
  "my_vector" : [-0.5, 10, 10]
}
Important

Ensure the vector data dimensions are consistent with those defined in the index.

3. Vector search

Elasticsearch's vector similarity search feature allows you to find the most similar documents by specifying a query vector. Below is an example query:

GET my_vector_index/_search
{
  "knn": {
    "field": "my_vector",
    "query_vector": [-5, 9, -12],
    "k": 10,
    "num_candidates": 100
  },
  "fields": [ "my_text" ]
}

Parameter

Content

k

(Optional) Represents the number of nearest neighbors returned. This value must be less than or equal to num_candidates. By default, this value is equal to the size.

num_candidates

(Optional) The number of nearest neighbor candidates to consider per shard. This parameter significantly affects performance and recall rate. The larger the num_candidates, the higher the recall rate, but the performance impact also increases. This value should be greater than k or its size (if k is omitted) and must not exceed 10,000. Elasticsearch collects num_candidates results from each shard and merges them to find the top k results. Increasing num_candidates often improves the accuracy of the final k results. The default value is Math.min(1.5 * k, 10,000).

Note

The k and num_candidates parameters explained: In HNSW, num_candidates refers to the ef value of the query, which represents the number of nearest num_candidates documents collected in the shard, while k is the number of documents Elasticsearch returns in the results.

For more information on parameters, see the knn Search API.

Additional vector search features include the following:

  • knn supports filter queries, allows for setting a minimum score for hit doc based on similarity, and accommodates nested fields.

  • Allows querying multiple knn fields simultaneously.

  • Enables precise knn queries using script.

  • Supports script for rescore.

  • For a comprehensive list of functionalities, refer to the k-Nearest Neighbor (knn) Search documentation.

Performance Optimization

Elasticsearch employs the HNSW algorithm for approximate knn searches. The HNSW algorithm, which is graph-based, operates effectively only when most vector data resides in memory. Thus, it's crucial to ensure that data nodes possess sufficient off-heap memory to accommodate both vector data and index structures. When utilizing the vector engine, consider the following points for performance optimization:

Set reasonable parameters

Consider setting appropriate m and ef_construction parameters. These are advanced parameters of the dense_vector type when creating an index. For more details, see the Dense Vector Field Type documentation.

Note

HNSW is an approximate knn search method that cannot ensure the return of the nearest data with 100% accuracy. The primary parameters influencing the recall rate are m and ef_construction.

Parameter

Content

m

Represents the number of neighbors of a node. The default value is 16. The more neighbors, the higher the recall rate, but this will have a greater impact on performance and increase memory usage. If there are strict requirements for recall rate, it can be set to 64 or a larger value.

ef_construction

During the construction of the HNSW graph, it finds the nearest doc to build neighbors when adding a node. The default is 100. The larger the value of ef_construction, the higher the recall rate. At the same time, this parameter has a significant impact on performance but does not affect memory usage. If there are strict requirements for recall rate, it can be set to 512 or a larger value.

Reduce memory consumption

Elasticsearch uses quantization technology to decrease memory usage. Quantization can reduce the memory footprint of vectors by 4, 8, or even 32 times. For instance, with the default float type, a vector value occupies 4 bytes. Using int8 quantization, each value only requires 1 byte. With int4 quantization, each value takes up half a byte. BBQ (Better Binary Quantization) quantization reduces the requirement to just 1 bit per value, with 8 values totaling 1 byte. This is only 1/32 of the original memory requirement.

To calculate the memory needed for vector data:

Take into account both the memory for vector data and the memory used by the HNSW graph index. The graph index occupies a smaller portion of memory when unquantized or with int8 quantization. However, with bbq quantization, the graph index's share of memory usage increases substantially. Thus, when estimating the memory requirements for vector data, it's essential to factor in the memory impact of the graph index.

Method for calculating vector data memory:

  • element_type: float: num_vectors * num_dimensions * 4

  • element_type: float with quantization: int8: num_vectors * (num_dimensions + 4)

  • element_type: float with quantization: int4: num_vectors * (num_dimensions/2 + 4)

  • element_type: float with quantization: bbq: num_vectors * (num_dimensions/8 + 12)

  • element_type: byte: num_vectors * num_dimensions

  • element_type: bit: num_vectors * (num_dimensions/8)

When using the flat type without creating an HNSW index, the memory usage for vector data is calculated as described above. However, if the HNSW type is selected, you must also account for the size of the graph index. Below is an estimated size for the graph index:

num_vectors * 4 * HNSW.m, where the default value of HNSW.m is 16, so by default it is num_vectors * 4 * 16.

Therefore, the total memory of vector data is the sum of the sizes of the above two parts.

Additionally, consider the number of number_of_replicas (number of index replicas). The above calculation is for the memory capacity of one copy of the data. Multiply by the number of replicas to get the total memory capacity. The default value of number_of_replicas is 1, so the memory capacity is twice that of one copy of the data.

Note

After quantization is enabled, the index capacity will be larger than before because Elasticsearch not only retains the original vectors but also adds the quantized vector data. The increase in capacity is attributed to the first part of the vector data memory calculation mentioned earlier. For instance, quantizing 40 GB of floating-point vectors with int8 will result in an additional 10 GB of data for the quantized vectors. Consequently, the total disk usage will amount to 50 GB, while the memory required for quick search will decrease to 10 GB.

Is the off-heap memory capacity sufficient

When assessing memory capacity and determining if node memory is adequate, pay close attention to the node's off-heap memory.

To obtain off-heap memory: A node must reserve sufficient memory for the Java heap. For nodes with memory up to 64 GB, the off-heap memory is generally half of the total memory. Beyond 64 GB, the default off-heap memory is the node memory minus 31 GB. The precise calculation can be performed using the following command:

GET _nodes/stats?human&filter_path=**.os.mem.total,**.jvm.mem.heap_max

The specific off-heap memory capacity of a node is: os.mem.total - jvm.mem.heap_max.

Vector index memory calculation

Example:

Assuming a dataset of 10 million 1024-dimensional vectors, using default vector values, enabling int8 quantization, m=16, and a default index number_of_replicas of 1, the total memory of the vector data is:

2 * (10,000,000 * (1,024 + 4) + 10,000,000 * 4 * 16) = 20.34 GB.

If two data nodes with 16 GB of memory are used to store this index, the total off-heap memory of the nodes is 16 / 2 * 2 = 16 GB, which is insufficient for the vector data.

If two data nodes with 32 GB of memory are used, the total off-heap memory is 32 / 2 * 2 = 32 GB, which can accommodate the vector data.

In practice, reserve some memory for other indexes, original text, and network traffic from data read and write operations. In production, insufficient off-heap memory often results in the disk ioutil indicator running at full capacity, along with significant random read traffic.

Prefetch file system cache

If the Elasticsearch server restarts, the file system cache is cleared. The operating system will need time to reload the index's hot areas into memory for fast search operations. You can use the index.store.preload setting to tell the operating system which files to load into memory immediately based on the file name extension.

Note

If the file system cache is too small to hold all data, preloading too many indexes or files can slow down search speed. Use the preload feature judiciously.

Example:

PUT /my_vector_index
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1,
    "index.store.preload": ["vex", "veq"]
  },
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "dims": 3
      },
      "my_text" : {
        "type" : "keyword"
      }
    }
  }
}

To load specific index files using index.store.preload, refer to the descriptions of the following file name extension suffixes:

The file name extensions listed below pertain to approximate knn search, with each extension corresponding to a specific type of quantization.

  • vex files store the HNSW graph structure.

  • vec files contain all non-quantized vector values, including floating-point numbers, bytes, and bits.

  • veq files index quantized vectors with int4 or int8 quantization.

  • veb files index binary vectors with bbq quantization.

  • vem, vemf, vemq, and vemb are metadata files, usually small and not necessary to preload.

Generally, when using quantized indexes, preload only the relevant quantized values and HNSW graph. Preloading original vectors is not required and may be counterproductive.

Configuration example:

hnsw: "index.store.preload": ["vex", "vec"]

int8, int4: "index.store.preload": ["vex", "veq"]

bbq: "index.store.preload": ["vex", "veb"]

For existing indexes, index.store.preload is a static parameter and cannot be modified after index creation. If temporary unavailability of the index is acceptable, close the index to set the parameter, then reopen it. Here's how:

POST my_vector_index/_close

PUT my_vector_index/_settings
{
  "index.store.preload": ["vex", "veq"]
}

POST my_vector_index/_open

Reduce the number of segments in the index

Elasticsearch shards are composed of segments, which are internal storage elements in the index. For approximate knn search, Elasticsearch stores the vector values of each segment as a separate HNSW graph, so knn search must check each segment. The parallelization of recent knn searches makes searching across multiple segments faster, but if there are fewer segments, the speed of knn search can still be improved several times. By default, Elasticsearch periodically merges smaller segments into larger segments through a background merging process. If this is not enough, you can take the following explicit steps to reduce the number of index segments.

1. Increase the maximum segment size

Elasticsearch offers several adjustable settings for the merging process. A key setting is index.merge.policy.max_merged_segment, which dictates the maximum size of segments produced during merging. By raising this value, you can reduce the number of segments. The default is 5 GB, which may be too small for larger vector dimensions. Consider increasing it to 10 GB or 20 GB. Example:

PUT my_vector_index/_settings
{
  "index.merge.policy.max_merged_segment": "10gb"
}

2. Create large segments during bulk indexing

A typical approach is to perform a bulk upload initially and then make the index searchable. Adjust index settings to encourage Elasticsearch to create larger initial segments, avoiding forced merges. Disable search and set index.refresh_interval to -1 during bulk upload to prevent refresh operations and additional segment creation. Allocate a large index buffer for Elasticsearch to accumulate more documents before refreshing. The default indices.memory.index_buffer_size is 10% of heap size, which is usually adequate for large heaps like 32 GB. To fully utilize the index buffer, increase the limit index.translog.flush_threshold_size.

Exclude vector fields from _source

Elasticsearch retains the original JSON document submitted at the time of indexing within the _source field. Typically, each search result hit includes the entire _source document. However, when documents feature dense vector fields with high dimensions, the _source can become quite large, leading to expensive loading times. This can greatly impact the efficiency of knn search operations.

Note

Operations like reindex, update, and update by query typically require the _source field. Excluding fields from _source may lead to unexpected behavior in these operations. For instance, during reindexing, the dense_vector field may not be included in the new index.

You can prevent dense vector fields from being loaded and returned during searches by excluding them from the _source field using the excludes mapping parameter. This not only reduces the volume of raw vector data processed but also shrinks the index size. However, even if vectors are excluded from _source, they can still be utilized in knn searches, which use separate data structures for search operations. It's important to consider the potential disadvantages of omitting the _source field, detailed in the sections above, before applying the excludes parameter.

PUT /my_vector_index
{
  "mappings": {
    "_source": {
      "excludes": [
        "my_vector"
      ]
    },
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "dims": 3
      },
      "my_text": {
        "type": "keyword"
      }
    }
  }
}

To view vector content in doc, if the Elasticsearch version is 8.17 or above, use:

GET my_vector_index/_search
{
  "docvalue_fields": ["my_vector"]
}

For other versions, use:

GET my_vector_index/_search
{
  "script_fields": {
    "vector_field": {
      "script": {
        "source" : "doc['my_vector'].vectorValue"
      }
    }
  }
}

In addition to excluding vector fields from _source, an alternative method is available in synthetic _source.