The aliyun-knn plug-in is a vector search engine designed by the Alibaba Cloud Elasticsearch team. It uses the vector library of Proxima, a vector search engine designed by Alibaba DAMO Academy. This plug-in allows you to use vector spaces in different search scenarios, such as image search, video fingerprinting, facial and speech recognition, and commodity recommendation based on your preferences. This topic describes how to use the aliyun-knn plug-in.
Prerequisites
- The aliyun-knn plug-in is installed. Whether this plug-in is installed by default
is determined by your Elasticsearch cluster version and kernel version.
- If the cluster version is V6.7.0 and the kernel version is V1.2.0 or later, the plug-in is integrated into the apack plug-in. The apack plug-in is installed by default. If you want to install or remove the aliyun-knn plug-in, you must perform operations on the apack plug-in. For more information, see Use the physical replication feature of the apack plug-in.
- If the cluster version is later than V6.7.0, or the cluster version is V6.7.0 and the kernel version is earlier than V1.2.0, you must manually install the aliyun-knn plug-in. For more information, see Install and remove a built-in plug-in.
Notice- Only Elasticsearch clusters of V6.7.0 or later support the aliyun-knn plug-in.
- Before you install the aliyun-knn plug-in, make sure that each data node offers at least 2 vCPUs and 8 GiB of memory. These specifications are only for tests. For a production environment, each data node must offer at least 4 vCPUs and 16 GiB of memory. If your Elasticsearch cluster does not meet these requirements, upgrade the data nodes in the cluster. For more information, see Upgrade the configuration of a cluster.
- Your cluster and the indexes in the cluster are planned. For more information, see Index planning and Cluster planning.
Background information
- Scenarios
The vector search engine of Alibaba Cloud Elasticsearch is used in numerous production scenarios inside Alibaba Group, such as Pailitao, Image Search, Youku video fingerprinting, Qutoutiao video fingerprinting, Taobao commodity recommendation, customized searches, and Crossmedia searches.
- Principle
The vector search feature of Alibaba Cloud Elasticsearch is implemented based on the aliyun-knn plug-in. The plug-in is compatible with all open source Elasticsearch versions. Therefore, you can use the vector search engine without the need to learn how to use it. In addition to real-time incremental synchronization and near-real-time (NRT) searches, this engine supports other features of open source Elasticsearch in distributed searches. The features include multiple replica shards, restoration, and snapshots.
The aliyun-knn plug-in also provides a scoring mechanism that is based on the Euclidean distance between vectors. The following description provides distance and score calculation formulas based on Euclidean mathematics. In the following description, two n-dimensional vectors A and B are used.- Euclidean distance without square root extraction = (A1 - B1)² + (A2 - B2)² + ... + (An - Bn)²
- Score = 1/(Distance + 1)
The distance is a Euclidean distance without square root extraction. For example, two two-dimensional vectors [0,0] and [1,2] exist. In this case, the Euclidean distance without square root extraction is 5, and the score is 0.167.Note In practice, you can calculate a distance based on a score. Then, optimize vectors based on the distance to increase the score.
- Algorithms
The vector search engine supports the Hierarchical Navigable Small World (HNSW) and Linear Search algorithms. These algorithms are suitable for processing small amounts of data from in-memory storage. The following table compares the performance of the two algorithms.
Table 1. Comparison between the performance of HNSW and Linear Search The performance of the two algorithms is measured on an Alibaba Cloud Elasticsearch V6.7.0 cluster. The following description provides the test environment:- Node configuration: two data nodes (each with 16 vCPUs and 64 GiB of memory) and one 100-GiB standard SSD
- Datasets: SIFT 128-dimensional float type vectors
- Total data records: 20 million
- Index settings: default settings
Performance metric HNSW Linear Search Top-10 recall ratio 98.6% 100% Top-50 recall ratio 97.9% 100% Top-100 recall ratio 97.4% 100% Latency (p99) 0.093s 0.934s Latency (p90) 0.018s 0.305s Note p is short for percentage. For example, latency (p99) indicates how many seconds it requires to respond to 99% of queries.
Index planning
Algorithm | Use scenario | In-memory storage | Remarks |
---|---|---|---|
HNSW |
|
Yes |
|
Linear Search |
|
Yes | None. |
Cluster planning
Item | Description |
---|---|
Data node specifications (required) | The minimum data node specifications for a production environment are 4 vCPUs and 16 GiB of memory. The specifications of 2 vCPUs and 8 GiB of memory are only for tests. |
Maximum volume of data that can be stored on each data node | The maximum volume of data that can be stored on each data node equals 50% of the total memory space of the data node. |
Write throttling | Vector indexing is a CPU-intensive job. We recommend that you do not maintain a high
write throughput. A peak write throughput lower than 5,000 TPS is recommended for
a data node with 16 vCPUs and 64 GiB of memory. TPS is short for transactions per
second.
When Elasticsearch processes queries, it loads all indexes to node memory. If nodes are out of memory, Elasticsearch reallocates shards. Therefore, we recommend that you do not write large amounts of data to Elasticsearch when it is processing queries. |
Procedure
Parameters
Parameter | Description | Default value |
---|---|---|
index.vector.algorithm |
The algorithm used by an index. Valid values: hnsw and linear. | hnsw |
Parameter | Description | Default value |
---|---|---|
index.vector.hnsw.builder.max_scan_num |
The maximum number of the nearest neighbors that you want to scan when a graph is created under the worst case. | 100000 |
index.vector.hnsw.builder.neighbor_cnt |
The maximum number of the nearest neighbors that each node can have at layer 0. We recommend that you set this parameter to 100. The quality of a graph increases with the value of this parameter. However, inactive indexes consume more storage resources. | 100 |
index.vector.hnsw.builder.upper_neighbor_cnt |
The maximum number of the nearest neighbors that each node can have at a layer other
than layer 0. We recommend that you set this parameter to 50% of the neighbor_cnt .
|
50 |
index.vector.hnsw.builder.efconstruction |
The number of the nearest neighbors that you want to scan when a graph is created. The quality of a graph increases with the value of this parameter. However, a longer time period is required to create indexes. We recommend that you set this parameter to 400. | 400 |
index.vector.hnsw.builder.max_level |
The total number of layers, which includes layer 0. For example, you have 10 million
documents and the scaling_factor parameter is set to 30. Use 30 as the base number and round up the logarithm of 10,000,000
to the nearest integer. The result is 5.
|
6 |
index.vector.hnsw.builder.scaling_factor |
A scaling factor. The volume of data at a layer equals the volume of data at its upper
layer multiplied by the scaling factor. Valid values: 10 to 100. The number of layers
decreases with the value of scaling_factor . We recommend that you set this parameter to 50.
|
50 |
Parameter | Description | Default value |
---|---|---|
ef |
The number of the nearest neighbors that are scanned during an online search. A large value increases the recall ratio but slows down the search. Valid values: 100 to 1000. | 100 |
Sample request:
GET test/_search
{
"query": {
"hnsw": {
"feature": {
"vector": [1.5, 2.5],
"size": 10,
"ef": 100
}
}
}
}
Parameter | Description | Default value |
---|---|---|
indices.breaker.vector.native.indexing.limit |
If the off-heap memory usage exceeds the value specified by this parameter, write operations are suspended. After Elasticsearch creates indexes and releases the memory, it resumes the write operations. If the circuit breaker is triggered, the system memory consumption is high. We recommend that you throttle the write throughput. If you are a beginner, we recommend that you use the default value. | 70% |
indices.breaker.vector.native.total.limit |
The maximum proportion of off-heap memory used to create vector indexes. If the actual off-heap memory usage exceeds the value specified by this parameter, Elasticsearch may reallocate shards. If you are a beginner, we recommend that you use the default value. | 80% |
FAQ
-
Q: How do I evaluate the recall ratio of documents?
A: You can create two indexes. One uses the HNSW algorithm and the other uses the Linear Search algorithm. Keep other index settings consistent for the two indexes. Use a client to add the same vector data to the indexes. Then, refresh the indexes. Compare the document IDs returned by the HNSW index and the Linear Search index after the same query vector is used. Then, find out the same document IDs that are returned by both indexes.Note Divide the number of document IDs returned by both indexes by the total number of returned document IDs to calculate the recall ratio of the documents. -
Q: When I write data to Elasticsearch, the system displays the circuitBreakingException error. What do I do?
A: This error indicates that the off-heap memory usage exceeds the proportion specified by the
indices.breaker.vector.native.indexing.limit
parameter and that the write operation is suspended. The default proportion is 70%. In most cases, after Elasticsearch creates indexes and releases memory, the write operation is automatically resumed. We recommend that you add a retry mechanism to the data write script on your client. -
Q: Why is the CPU still working after the write operation is suspended?
A: Elasticsearch creates vector indexes during both the refresh and flush processes. The vector index creation task may be still running even if the write operation is suspended. Computing resources are released after the final refresh is completed.