This topic describes the dynamic quantization feature of DashVector.
Background
Quantization is an optimization method commonly used in vector search technologies. It greatly improves the search performance and reduces the memory space occupied by index files at the cost of acceptable loss of precision, or recall rate.
DashVector supports dynamic vector quantization. To use this feature, you can specify a proper quantization policy when you create a collection. For information about how to create a collection, see Create a collection.
Collections with quantization enabled do not support sparse vectors. If you need to use both the quantization and sparse vector features, contact us by joining our DingTalk group. The group ID is: 25130022704.
Enable dynamic quantization
Prerequisites
A cluster is created. For more information, see Create a cluster.
An API key is obtained. For more information, see Manage API keys.
The SDK of the latest version is installed. For more information, see Install DashVector SDK.
Sample code
You need to replace YOUR_API_KEY with your API key and YOUR_CLUSTER_ENDPOINT with the endpoint of your cluster in the sample code for the code to run properly.
You can view the endpoint of your cluster on the Cluster Detail page in the DashVector console.
import dashvector
import numpy as np
client = dashvector.Client(
api_key='YOUR_API_KEY',
endpoint='YOUR_CLUSTER_ENDPOINT'
)
assert client
# Create a collection with a quantization policy.
ret = client.create(
'quantize_demo',
dimension=768,
extra_params={
'quantize_type': 'DT_VECTOR_INT8'
}
)
print(ret)
collection = client.get('quantize_demo')
# Insert a document, which will be automatically quantized based on the quantization policy defined during collection creation.
collection.insert(('1', np.random.rand(768).astype('float32')))
# Obtain the document based on the ID. Note that the obtained vector data is not the original value inserted, but a dequantized approximate value.
doc = collection.fetch('1')
# If you specify to return vector data in a search, the returned vector data is not the original value inserted, but a dequantized approximate value.
docs = collection.query(
vector=np.random.rand(768).astype('float32'),
include_vector=True
)
When you obtain a document by using its ID or search for a document with include_vector=True, the vector data obtained is not the original value inserted, but a dequantized approximate value. For more information, see Obtain documents and Search for documents.
Parameter description
When creating a collection, you can use the quantize_type field of the extra_params: Dict[str, str] parameter to define the quantization policy. The following valid value of quantize_type is supported:
DT_VECTOR_INT8: Quantizes an FP32 vector to an INT8 vector.
Performance and recall rate
Based on a dataset of one million 768-dimensional vectors
DashVector cluster specification: P.large
Metric: cosine
TopK: 100
Quantization policy | Index size ratio | QPS | Recall rate |
None | 100% | 495.6 | 99.05% |
DT_VECTOR_INT8 | 33.33% | 733.8 (+48%) | 94.67% |
As indicated in the comparison, the index rate is reduced by two thirds and the QPS is increased by 48% at the cost of a recall rate decrease of 4.38%.
The results are measured based on a Cohere dataset and are for reference only. The actual results vary with the data distribution method of datasets.
References
Dataset | Quantization policy | Index size ratio | Recall rate | QPS increase |
Cohere 10M 768 Cosine | DT_VECTOR_INT8 | 33% | 95.28% | 170% |
GIST 1M 960 L2 | DT_VECTOR_INT8 | 35% | 99.54% | 134% |
OpenAI 5M 1536 Cosine | DT_VECTOR_INT8 | 34% | 67.34% | 189% |
Deep1B 10M 96 Cosine | DT_VECTOR_INT8 | 52% | 99.97% | 135% |
Internal dataset 8M 512 Cosine | DT_VECTOR_INT8 | 38% | 99.92% | 152% |
The preceding table indicates that the DashVector quantization policy is not applicable to all datasets. You must be cautious about using the quantization policy in your production environments.
We recommend that you create two collections, one with and the other without the quantization policy, and determine whether to use the policy in production environments after careful comparison, testing, and verification.