The Alibaba Cloud Elasticsearch vector engine is designed for processing large-scale vector data, combining Elasticsearch's robust search capabilities with vector similarity computing power. It is ideal for applications such as recommendation systems, image retrieval, and natural language processing. This guide will show you how to effectively utilize the Alibaba Cloud Elasticsearch vector engine and offer best practices for optimal performance, cost-efficiency, and user experience. It is advisable to use the latest version of Alibaba Cloud Elasticsearch for continuous improvements.
Prerequisites
You must have an ES instance created. If you have not yet created one, see Quick Start to create the latest version of Alibaba Cloud ES 8.x.
Instance type: It is recommended to use the turbo type to enhance the vector engine's performance.
Specifications: The vector engine requires significant off-heap memory to cache vector indexes. Select data node specifications and quantity based on off-heap memory usage, which can be estimated using the memory calculation instructions provided below.
Operations
1. Create an index
The first step is to create an index suitable for storing vector data. Below is an example index definition:
PUT /my_vector_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"my_vector": {
"type": "dense_vector",
"dims": 3
},
"my_text" : {
"type" : "keyword"
}
}
}
}
The
number_of_shards
andnumber_of_replicas
settings should be based on your data scale and performance needs.Use the
dense_vector
type to store vector data, with thedims
parameter defining the vector dimensions.For more information on the dense_vector parameters, see the Dense Vector Field Type documentation.
2. Data import
You can import data into the Elasticsearch vector index through various methods, such as utilizing the Bulk API
for batch data import. Here is an example:
PUT my_vector_index/_doc/1
{
"my_text" : "text1",
"my_vector" : [0.5, 10, 6]
}
PUT my_vector_index/_doc/2
{
"my_text" : "text2",
"my_vector" : [-0.5, 10, 10]
}
Ensure the vector data dimensions are consistent with those defined in the index.
3. Vector search
Elasticsearch's vector similarity search feature allows you to find the most similar documents by specifying a query vector. Below is an example query:
GET my_vector_index/_search
{
"knn": {
"field": "my_vector",
"query_vector": [-5, 9, -12],
"k": 10,
"num_candidates": 100
},
"fields": [ "my_text" ]
}
Parameter | Content |
| (Optional) Represents the number of nearest neighbors returned. This value must be less than or equal to |
| (Optional) The number of nearest neighbor candidates to consider per shard. This parameter significantly affects performance and recall rate. The larger the |
The k
and num_candidates
parameters explained: In HNSW, num_candidates
refers to the ef value of the query, which represents the number of nearest num_candidates
documents collected in the shard, while k is the number of documents Elasticsearch returns in the results.
For more information on parameters, see the knn Search API.
Additional vector search features include the following:
knn
supportsfilter
queries, allows for setting a minimumscore
for hitdoc
based onsimilarity
, and accommodatesnested
fields.Allows querying multiple
knn
fields simultaneously.Enables precise
knn
queries usingscript
.Supports
script
forrescore
.For a comprehensive list of functionalities, refer to the k-Nearest Neighbor (knn) Search documentation.
Performance Optimization
Elasticsearch employs the HNSW
algorithm for approximate knn
searches. The HNSW
algorithm, which is graph-based, operates effectively only when most vector data resides in memory. Thus, it's crucial to ensure that data nodes possess sufficient off-heap memory to accommodate both vector data and index structures. When utilizing the vector engine, consider the following points for performance optimization:
Set reasonable parameters
Consider setting appropriate m
and ef_construction
parameters. These are advanced parameters of the dense_vector
type when creating an index. For more details, see the Dense Vector Field Type documentation.
HNSW
is an approximate knn
search method that cannot ensure the return of the nearest data with 100% accuracy. The primary parameters influencing the recall rate are m
and ef_construction
.
Parameter | Content |
| Represents the number of neighbors of a node. The default value is 16. The more neighbors, the higher the recall rate, but this will have a greater impact on performance and increase memory usage. If there are strict requirements for recall rate, it can be set to 64 or a larger value. |
| During the construction of the |
Reduce memory consumption
Elasticsearch uses quantization technology to decrease memory usage. Quantization can reduce the memory footprint of vectors by 4, 8, or even 32 times. For instance, with the default float
type, a vector value occupies 4 bytes. Using int8
quantization, each value only requires 1 byte. With int4
quantization, each value takes up half a byte. BBQ (Better Binary Quantization) quantization reduces the requirement to just 1 bit per value, with 8 values totaling 1 byte. This is only 1/32 of the original memory requirement.
To calculate the memory needed for vector data:
Take into account both the memory for vector data and the memory used by the HNSW
graph index. The graph index occupies a smaller portion of memory when unquantized or with int8
quantization. However, with bbq
quantization, the graph index's share of memory usage increases substantially. Thus, when estimating the memory requirements for vector data, it's essential to factor in the memory impact of the graph index.
Method for calculating vector data memory:
element_type: float
:num_vectors * num_dimensions * 4
element_type: float
withquantization: int8
:num_vectors * (num_dimensions + 4)
element_type: float
withquantization: int4
:num_vectors * (num_dimensions/2 + 4)
element_type: float
withquantization: bbq
:num_vectors * (num_dimensions/8 + 12)
element_type: byte
:num_vectors * num_dimensions
element_type: bit
:num_vectors * (num_dimensions/8)
When using the flat
type without creating an HNSW
index, the memory usage for vector data is calculated as described above. However, if the HNSW
type is selected, you must also account for the size of the graph index. Below is an estimated size for the graph index:
num_vectors * 4 * HNSW.m
, where the default value of HNSW.m
is 16, so by default it is num_vectors * 4 * 16
.
Therefore, the total memory of vector data is the sum of the sizes of the above two parts.
Additionally, consider the number of number_of_replicas
(number of index replicas). The above calculation is for the memory capacity of one copy of the data. Multiply by the number of replicas to get the total memory capacity. The default value of number_of_replicas
is 1, so the memory capacity is twice that of one copy of the data.
After quantization is enabled, the index capacity will be larger than before because Elasticsearch not only retains the original vectors but also adds the quantized vector data. The increase in capacity is attributed to the first part of the vector data memory calculation mentioned earlier. For instance, quantizing 40 GB of floating-point vectors with int8
will result in an additional 10 GB of data for the quantized vectors. Consequently, the total disk usage will amount to 50 GB, while the memory required for quick search will decrease to 10 GB.
Is the off-heap memory capacity sufficient
When assessing memory capacity and determining if node memory is adequate, pay close attention to the node's off-heap memory.
To obtain off-heap memory: A node must reserve sufficient memory for the Java heap. For nodes with memory up to 64 GB, the off-heap memory is generally half of the total memory. Beyond 64 GB, the default off-heap memory is the node memory minus 31 GB. The precise calculation can be performed using the following command:
GET _nodes/stats?human&filter_path=**.os.mem.total,**.jvm.mem.heap_max
The specific off-heap memory capacity of a node is: os.mem.total - jvm.mem.heap_max
.
Vector index memory calculation
Example:
Assuming a dataset of 10 million 1024-dimensional vectors, using default vector values, enabling int8
quantization, m=16
, and a default index number_of_replicas
of 1, the total memory of the vector data is:
2 * (10,000,000 * (1,024 + 4) + 10,000,000 * 4 * 16) = 20.34 GB.
If two data nodes with 16 GB of memory are used to store this index, the total off-heap memory of the nodes is 16 / 2 * 2 = 16 GB, which is insufficient for the vector data.
If two data nodes with 32 GB of memory are used, the total off-heap memory is 32 / 2 * 2 = 32 GB, which can accommodate the vector data.
In practice, reserve some memory for other indexes, original text, and network traffic from data read and write operations. In production, insufficient off-heap memory often results in the disk ioutil indicator running at full capacity, along with significant random read traffic.
Prefetch file system cache
If the Elasticsearch server restarts, the file system cache is cleared. The operating system will need time to reload the index's hot areas into memory for fast search operations. You can use the index.store.preload
setting to tell the operating system which files to load into memory immediately based on the file name extension.
If the file system cache is too small to hold all data, preloading too many indexes or files can slow down search speed. Use the preload feature judiciously.
Example:
PUT /my_vector_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1,
"index.store.preload": ["vex", "veq"]
},
"mappings": {
"properties": {
"my_vector": {
"type": "dense_vector",
"dims": 3
},
"my_text" : {
"type" : "keyword"
}
}
}
}
To load specific index files using index.store.preload
, refer to the descriptions of the following file name extension suffixes:
The file name extensions listed below pertain to approximate knn
search, with each extension corresponding to a specific type of quantization.
vex
files store the HNSW graph structure.vec
files contain all non-quantized vector values, including floating-point numbers, bytes, and bits.veq
files index quantized vectors withint4
orint8
quantization.veb
files index binary vectors withbbq
quantization.vem
,vemf
,vemq
, andvemb
are metadata files, usually small and not necessary to preload.
Generally, when using quantized indexes, preload only the relevant quantized values and HNSW graph. Preloading original vectors is not required and may be counterproductive.
Configuration example:
hnsw: "index.store.preload": ["vex", "vec"]
int8, int4: "index.store.preload": ["vex", "veq"]
bbq: "index.store.preload": ["vex", "veb"]
For existing indexes, index.store.preload
is a static parameter and cannot be modified after index creation. If temporary unavailability of the index is acceptable, close the index to set the parameter, then reopen it. Here's how:
POST my_vector_index/_close
PUT my_vector_index/_settings
{
"index.store.preload": ["vex", "veq"]
}
POST my_vector_index/_open
Reduce the number of segments in the index
Elasticsearch shards are composed of segments, which are internal storage elements in the index. For approximate knn
search, Elasticsearch stores the vector values of each segment as a separate HNSW
graph, so knn
search must check each segment. The parallelization of recent knn
searches makes searching across multiple segments faster, but if there are fewer segments, the speed of knn
search can still be improved several times. By default, Elasticsearch periodically merges smaller segments into larger segments through a background merging process. If this is not enough, you can take the following explicit steps to reduce the number of index segments.
1. Increase the maximum segment size
Elasticsearch offers several adjustable settings for the merging process. A key setting is index.merge.policy.max_merged_segment
, which dictates the maximum size of segments produced during merging. By raising this value, you can reduce the number of segments. The default is 5 GB, which may be too small for larger vector dimensions. Consider increasing it to 10 GB or 20 GB. Example:
PUT my_vector_index/_settings
{
"index.merge.policy.max_merged_segment": "10gb"
}
2. Create large segments during bulk indexing
A typical approach is to perform a bulk upload initially and then make the index searchable. Adjust index settings to encourage Elasticsearch to create larger initial segments, avoiding forced merges. Disable search and set index.refresh_interval
to -1 during bulk upload to prevent refresh operations and additional segment creation. Allocate a large index buffer for Elasticsearch to accumulate more documents before refreshing. The default indices.memory.index_buffer_size
is 10% of heap size, which is usually adequate for large heaps like 32 GB. To fully utilize the index buffer, increase the limit index.translog.flush_threshold_size
.
Exclude vector fields from _source
Elasticsearch retains the original JSON document submitted at the time of indexing within the _source
field. Typically, each search result hit includes the entire _source
document. However, when documents feature dense vector fields with high dimensions, the _source
can become quite large, leading to expensive loading times. This can greatly impact the efficiency of knn
search operations.
Operations like reindex, update, and update by query typically require the _source
field. Excluding fields from _source
may lead to unexpected behavior in these operations. For instance, during reindexing, the dense_vector
field may not be included in the new index.
You can prevent dense vector fields from being loaded and returned during searches by excluding them from the _source
field using the excludes
mapping parameter. This not only reduces the volume of raw vector data processed but also shrinks the index size. However, even if vectors are excluded from _source
, they can still be utilized in knn
searches, which use separate data structures for search operations. It's important to consider the potential disadvantages of omitting the _source
field, detailed in the sections above, before applying the excludes
parameter.
PUT /my_vector_index
{
"mappings": {
"_source": {
"excludes": [
"my_vector"
]
},
"properties": {
"my_vector": {
"type": "dense_vector",
"dims": 3
},
"my_text": {
"type": "keyword"
}
}
}
}
To view vector content in doc
, if the Elasticsearch version is 8.17 or above, use:
GET my_vector_index/_search
{
"docvalue_fields": ["my_vector"]
}
For other versions, use:
GET my_vector_index/_search
{
"script_fields": {
"vector_field": {
"script": {
"source" : "doc['my_vector'].vectorValue"
}
}
}
}
In addition to excluding vector fields from _source
, an alternative method is available in synthetic _source
.