The aliyunknn plugin is a vector search engine developed by the Alibaba Cloud Elasticsearch team. It uses the vector library of Proxima, a vector search engine designed by Alibaba DAMO Academy. This plugin can meet your requirements on vector searches in various scenarios, such as image search, video fingerprinting, facial and speech recognition, and commodity recommendation. This topic describes how to use the aliyunknn plugin.
Prerequisites
 The aliyunknn plugin is installed. Whether this plugin is installed by default
is determined by your Elasticsearch cluster version and kernel version.
Cluster version Kernel version Description V6.7.0 Earlier than V1.2.0  You must manually install the aliyunknn plugin on the Plugins page of the Elasticsearch console based on the instructions provided in Install and remove a builtin plugin.
 You are not allowed to use the script query and index warmup features of the aliyunknn plugin.
V6.8 None V7.4 None V7.7 None V6.7.0 V1.2.0 or later  The aliyunknn plugin is integrated into the apack plugin, which is installed by default. If you want to remove or reinstall the aliyunknn plugin, you must perform operations on the apack plugin. For more information, see Use the physical replication feature of the apack plugin.
 You can use the script query and index warmup features of the aliyunknn plugin. Before you use the features, make sure that the version of the apack plugin installed on your cluster is V1.2.1 or later. You can run the GET _cat/plugins?v command to obtain the version of the apack plugin. If the version of the apack plugin is earlier than V1.2.1, you can submit a ticket to ask Alibaba Cloud engineers to update the plugin.
V7.10.0 V1.4.0 or later  The aliyunknn plugin is integrated into the apack plugin, which is installed by default. If you want to remove or reinstall the aliyunknn plugin, you must perform operations on the apack plugin. For more information, see Use the physical replication feature of the apack plugin.
 If the kernel version of your cluster is V1.4.0 or later, the version of the apack plugin is the latest. You can run the GET _cat/plugins?v command to obtain the version of the apack plugin.
Other versions None The vector search feature is not supported. Note The kernel version is different from the version of the apack plugin. You can run the GET_cat /plugins?v command to obtain the version of the apack plugin.
 Before you install the aliyunknn plugin, make sure that the data node specifications of your cluster are 2 vCPUs and 8 GiB of memory or higher. The specifications of 2 vCPUs and 8 GiB of memory are used only for functional tests. In a production environment, the data node specifications of your cluster must be 4 vCPUs and 16 GiB of memory or higher. If the data node specifications of your cluster do not meet these requirements, upgrade the data nodes in your cluster. For more information, see Upgrade the configuration of a cluster.
 Your cluster and the indexes that you want to store on the cluster are planned. For more information, see Index planning and Cluster planning.
Background information
 Scenarios
The vector search engine of Alibaba Cloud Elasticsearch is used in numerous production scenarios inside Alibaba Group, such as Pailitao, Image Search, Youku video fingerprinting, Qutoutiao video fingerprinting, Taobao commodity recommendation, customized searches, and Crossmedia searches.
 Principle
The vector search feature of Alibaba Cloud Elasticsearch is implemented based on the aliyunknn plugin. The plugin is compatible with all the versions of open source Elasticsearch. Therefore, you can use the plugin without additional learning costs. In addition to realtime incremental synchronization and nearrealtime (NRT) searches, vector indexes support the other features of open source Elasticsearch in distributed searches. The features include multiple replica shards, restoration, and snapshots.
 Algorithms
The aliyunknn plugin supports the Hierarchical Navigable Small World (HNSW) and Linear Search algorithms. These algorithms are suitable for processing small amounts of data from inmemory storage. The following description compares the performance of the algorithms:
Table 1. Comparison between the performance of HNSW and Linear Search The following table lists the performance metrics of the two algorithms on an Alibaba Cloud Elasticsearch V6.7.0 cluster. Configurations of the test environment: Node configurations: two data nodes (each with 16 vCPUs and 64 GiB of memory) and one 100GiB standard SSD
 Datasets: SIFT 128dimensional floattype vectors
 Total data records: 20 million
 Index settings: default settings
Performance metric HNSW Linear Search Top10 recall ratio 98.6% 100% Top50 recall ratio 97.9% 100% Top100 recall ratio 97.4% 100% Latency (p99) 0.093s 0.934s Latency (p90) 0.018s 0.305s Note p is short for percentage. For example, latency (p99) indicates the number of seconds that are required to respond to 99% of queries.
Index planning
Algorithm  Use scenario  Inmemory storage  Remarks 

HNSW 

Yes 

Linear Search 

Yes  None. 
Cluster planning
Item  Description 

Data node specifications (required)  The minimum specifications of a data node in a production environment must be 4 vCPUs and 16 GiB of memory. The specifications of 2 vCPUs and 8 GiB of memory are used only for functional tests. 
Maximum volume of the data that can be stored on each data node  The maximum volume of the data that can be stored on each data node must be 50% of the total memory size of the data node. 
Write throttling  Vector indexing is a CPUintensive job. We recommend that you do not maintain a high
write throughput. A peak write throughput lower than 5,000 transactions per second
(TPS) is recommended for a data node with 16 vCPUs and 64 GiB of memory.
When the system processes queries on vector indexes, it loads all the indexes to node memory. If nodes are out of memory, the system reallocates shards. Therefore, we recommend that you do not write large amounts of data to your cluster when the system is processing queries. 
Create a vector index
Search for a vector
 Standard search
Run the following command to perform a standard search for a vector:
GET test/_search { "query": { "hnsw": { "feature": { "vector": [1.5, 2.5], "size": 10 } } } }
The following table describes the parameters in the preceding command.
Parameter Description hnsw The algorithm that is used to search for the vector. The value must be the same as that of the algorithm
parameter specified when you create the index.vector The vector for which you want to search. The length of the array for the vector must be the same as the value of the dim
parameter specified inmapping
.size The number of recalled documents. Note The aliyunknn plugin also supports some advanced search parameters. For more information, see Advanced parameters.  Script query
Script queries are compatible with Elastic domainspecific language (DSL) queries. For example, you can use the script_score parameter to score each document in a query response. The score is calculated by using the following formula:
1/(1 + l2Squared(params.queryVector, doc['feature']))
. The following code provides a sample script query:GET test/_search { "query": { "match_all": {} }, "rescore": { "query": { "rescore_query": { "function_score": { "functions": [{ "script_score": { "script": { "source": "1/(1+l2Squared(params.queryVector, doc['feature'])) ", "params": { "queryVector": [2.0, 2.0] } } } }] } } } } }
In addition to the functions supported by script queries in open source Elasticsearch, the script queries in Alibaba Cloud Elasticsearch support the functions described in the following table.
Function Description l2Squared(float[] queryVector, DocValues docValues) The function that is used to search for a vector based on a Euclidean distance. hamming(float[] queryVector, DocValues docValues) The function that is used to search for a vector based on a Hamming distance. cosineSimilarity(float[] queryVector, DocValues docValues) The function that is used to search for a vector based on a cosine similarity. Note You can perform script queries only in V6.7.0 clusters whose apack plugin is of V1.2.1 or later and V7.10.0 clusters whose apack plugin is of V1.4.0 or later. You can run the GET _cat/plugins?v command to obtain the version of the apack plugin. If the version of the apack plugin does not meet the requirements, you can submit a ticket to ask Alibaba Cloud engineers to update the apack plugin.
 Parameters in the preceding functions:
 float[] queryVector: the query vector. You can set this parameter to a formal or actual parameter.
 DocValues docValues: the document vectors.
 You are not allowed to perform script queries on the indexes that are created by using the offline optimization mode.
 Index warmup
Searches are performed on vector indexes in inmemory storage. Therefore, after you write data to a vector index, the latency of the first search is high. This is because the index is still being loaded to memory. To address this issue, the aliyunknn plugin provides the index warmup feature. This feature allows you to warm up your vector indexes and load them to the memory of your onpremises machine before the aliyunknn plugin provides vector search services. This significantly reduces the latency of vector searches.
 Run the following command to warm up all vector indexes:
POST _vector/warmup
 Run the following command to warm up a specific vector index:
POST _vector/{indexName}/warmup
Notice The index warmup feature is available only for V6.7.0 clusters whose apack plugin is of V1.2.1 or later and V7.10.0 clusters whose apack plugin is of V1.4.0 or later. You can run the GET _cat/plugins?v command to obtain the version of the apack plugin. If the version of the apack plugin does not meet the requirements, you can submit a ticket to ask Alibaba Cloud engineers to update the apack plugin.
 If your cluster stores a large number of vector indexes, the indexes store large volumes of data, and vector searches are required only for a specific vector index, we recommend that you warm up only the specific vector index to improve the search performance.
 Run the following command to warm up all vector indexes:
Vector scoring
A unified scoring formula is provided for vector searches. The formula depends on distance measurement functions. The distance measurement functions affect the sorting of search results.
Scoring formula:
Score = 1/(Vector distance + 1)
 By default, the scoring mechanism uses Euclidean distances without square root extraction to calculate scores.
 In practice, you can calculate a distance based on a score. Then, optimize vectors based on the distance to increase the score.
Distance measurement functions
Distance measurement function  Description  Scoring formula  Use scenario  Example 

SquaredEuclidean  This function is used to calculate the Euclidean distance between vectors. The Euclidean distance, also called Euclidean metric, refers to the actual distance between points or the natural length of a vector in an mdimensional space. Euclidean distances are widely used. A Euclidean distance in a two or threedimensional space is the actual distance between points.  For example, two ndimensional vectors [A1, A2, ..., An] and [B1, B2, ..., Bn] exist.
Notice By default, the scoring mechanism uses Euclidean distances without square root extraction
to calculate scores.

A Euclidean distance can reflect the absolute difference between the characteristics of individual numbers. Therefore, Euclidean distances are widely used to analyze the differences between numbers from various dimensions. For example, you can use a user behavior metric to analyze the differences or similarities between user values.  Two twodimensional vectors [0,0] and [1,2] exist. In this case, the Euclidean distance without square root extraction between the vectors is 5, which is calculated by using the following formula: (1  0)² + (2  0)². 
Cosine  This function is used to calculate the cosine similarity between vectors. The function calculates the cosine of the angle between the vectors to obtain the cosine similarity between them.  For example, two ndimensional vectors [A1, A2, ..., An] and [B1, B2, ..., Bn] exist.

A cosine similarity reflects the differences between vectors from orientations. Cosine similarities are used to score content to obtain the similarities or differences between user interests. In addition, cosine similarities are subject to the orientations of vectors rather than the numbers in them. Therefore, the cosine similarities can address the issue that users may use different measurement standards.  For example, two twodimensional vectors [1,1] and [1, 0] exist. The cosine similarity between the vectors is 0.707. 
InnerProduct  This function is used to calculate the inner product between vectors. An inner product is also called a dot product. A dot product is a dyadic operation that takes two vectors of the real number R and returns a scalar for the real number.  For example, two ndimensional vectors [A1, A2, ..., An] and [B1, B2, ..., Bn] exist.

The inner product takes both the angle and absolute length between the vectors into consideration. After vectors are normalized, the formula for inner products is equivalent to that for cosine similarities.  For example, two twodimensional vectors [1,1] and [1,5] exist. The inner product between the vectors is 6. 
Hamming (available only for vectors of the BINARY data type)  In information theory, the Hamming distance between strings of the same length equals the number of positions at which symbols differ. d(x, y) is used to represent the Hamming distance between the strings x and y. In other words, a Hamming distance measures the minimum number of substitutions that are required to change the string x into the string y.  For example, two nbit binary strings x and y exist.

Hamming distances are used to detect or fix the errors that occur when data is transmitted over computer networks. A Hamming distance can also be used as an error estimation method to determine the number of different characters between binary strings.  For example, the Hamming distance between 1011101 and 1001001 is 2.
Notice When you use the aliyunknn plugin, the vector data that you want to write to your
vector index must be of the uint32 data type. This indicates that the vector data
must be represented by an unsigned 32bit decimal array. In addition, the value of
the
dim parameter must be a multiple of 32.

 Only V6.7.0 clusters whose apack plugin is of V1.2.1 or later and V7.10.0 clusters whose apack plugin is of V1.4.0 or later support all the preceding four functions. Other clusters support only the SquaredEuclidean function.
 You can run the GET _cat/plugins?v command to obtain the version of the apack plugin for your cluster. If the version of the apack plugin does not meet the requirements, you can submit a ticket to ask Alibaba Cloud engineers to update the apack plugin.
 The distance_method parameter in the
mapping
configuration is used to specify the distance measurement function that is used for your vector index.
Circuit breaker parameters
Parameter  Description  Default value 

indices.breaker.vector.native.indexing.limit 
If the offheap memory usage exceeds the value specified by this parameter, write operations are suspended. After the system creates indexes and releases the memory, the system resumes the write operations. If the circuit breaker is triggered, the consumption of system memory is high. In this case, we recommend that you throttle the write throughput.  70% 
indices.breaker.vector.native.total.limit 
The maximum proportion of offheap memory used to create vector indexes. If the actual offheap memory usage exceeds the value specified by this parameter, the system may reallocate shards.  80% 
Advanced parameters
Parameter  Description  Default value 

index.vector.hnsw.builder.max_scan_num 
The maximum number of the nearest neighbors that can be scanned when a graph is created under the worst case.  100000 
index.vector.hnsw.builder.neighbor_cnt 
The maximum number of the nearest neighbors that each node can have at layer 0. We recommend that you set this parameter to 100. The quality of a graph increases with the value of this parameter. However, inactive indexes consume more storage resources.  100 
index.vector.hnsw.builder.upper_neighbor_cnt 
The maximum number of the nearest neighbors that each node can have at a layer other
than layer 0. We recommend that you set this parameter to 50% of the value specified
for the index.vector.hnsw.builder.neighbor_cnt parameter.

50 
index.vector.hnsw.builder.efconstruction 
The number of the nearest neighbors that can be scanned when a graph is created. The quality of a graph increases with the value of this parameter. However, a longer time period is required to create indexes. We recommend that you set this parameter to 400.  400 
index.vector.hnsw.builder.max_level 
The total number of layers, which includes layer 0. For example, if you have 10 million
documents and the scaling_factor parameter is set to 30, use 30 as the base number and round the logarithm of 10,000,000
up to the nearest integer. The result is 5.
This parameter does not have a significant impact on vector searches. We recommend that you set this parameter to 6. 
6 
index.vector.hnsw.builder.scaling_factor 
A scaling factor. The volume of data at a layer equals the volume of data at its upper
layer multiplied by the scaling factor. Valid values: 10 to 100. The number of layers
decreases with the value of scaling_factor . We recommend that you set this parameter to 50.

50 
setting
configuration only after you set the index.vector.algorithm parameter to hnsw.
Parameter  Description  Default value 

ef 
The number of the nearest neighbors that are scanned during an online search. A large value increases the recall ratio but slows down searches. Valid values: 100 to 1000.  100 
The following code provides a sample search request:
GET test/_search
{
"query": {
"hnsw": {
"feature": {
"vector": [1.5, 2.5],
"size": 10,
"ef": 100
}
}
}
}
FAQ

Q: How do I evaluate the recall ratio of documents?
A: You can create two indexes. One uses the HNSW algorithm and the other uses the Linear Search algorithm. Keep the other index settings consistent for the two indexes. Use a client to add the same vector data to the indexes, and refresh the indexes. Compare the document IDs returned by the HNSW index and the Linear Search index after the same query vector is used. Then, find out the same document IDs that are returned by both indexes.Note Divide the number of document IDs returned by both indexes by the total number of returned document IDs to calculate the recall ratio of the documents. 
Q: When I write data to my cluster, the system displays the "circuitBreakingException" error. What do I do?
A: This error indicates that the offheap memory usage exceeds the proportion specified by the
indices.breaker.vector.native.indexing.limit
parameter and that the write operation is suspended. The default proportion is 70%. In most cases, the write operation is automatically resumed after the system creates indexes and releases memory. We recommend that you add a retry mechanism to the data write script on your client. 
Q: Why is the CPU still working after the write operation is suspended?
A: The system creates vector indexes during both the refresh and flush processes. The index creation task may be still running even if the write operation is suspended. Computing resources are released after the final refresh is complete.

Q: Is the best practice for the aliyunknn plugin provided?
A: The Alibaba Cloud developer community provides the business scenarios and best practice of the aliyunknn plugin.