All Products
Search
Document Center

DashVector:Dynamic vector quantization

Last Updated:Apr 22, 2024

This topic describes the dynamic quantization feature of DashVector.

Background

Quantization is an optimization method commonly used in vector search technologies. It greatly improves the search performance and reduces the memory space occupied by index files at the cost of acceptable loss of precision, or recall rate.

DashVector supports dynamic vector quantization. To use this feature, you can specify a proper quantization policy when you create a collection. For information about how to create a collection, see Create a collection.

Important

Collections with quantization enabled do not support sparse vectors. If you need to use both the quantization and sparse vector features, contact us by joining our DingTalk group. The group ID is: 25130022704.

Enable dynamic quantization

Prerequisites

Sample code

Note
  1. You need to replace YOUR_API_KEY with your API key and YOUR_CLUSTER_ENDPOINT with the endpoint of your cluster in the sample code for the code to run properly.

  2. You can view the endpoint of your cluster on the Cluster Detail page in the DashVector console.

import dashvector
import numpy as np

client = dashvector.Client(
    api_key='YOUR_API_KEY',
    endpoint='YOUR_CLUSTER_ENDPOINT'
)
assert client

# Create a collection with a quantization policy.
ret = client.create(
    'quantize_demo',
    dimension=768,
    extra_params={
        'quantize_type': 'DT_VECTOR_INT8'
    }
)
print(ret)

collection = client.get('quantize_demo')

# Insert a document, which will be automatically quantized based on the quantization policy defined during collection creation.
collection.insert(('1', np.random.rand(768).astype('float32')))

# Obtain the document based on the ID. Note that the obtained vector data is not the original value inserted, but a dequantized approximate value.
doc = collection.fetch('1')

# If you specify to return vector data in a search, the returned vector data is not the original value inserted, but a dequantized approximate value.
docs = collection.query(
    vector=np.random.rand(768).astype('float32'),
    include_vector=True
)
Note

When you obtain a document by using its ID or search for a document with include_vector=True, the vector data obtained is not the original value inserted, but a dequantized approximate value. For more information, see Obtain documents and Search for documents.

Parameter description

When creating a collection, you can use the quantize_type field of the extra_params: Dict[str, str] parameter to define the quantization policy. The following valid value of quantize_type is supported:

  • DT_VECTOR_INT8: Quantizes an FP32 vector to an INT8 vector.

Performance and recall rate

Based on a dataset of one million 768-dimensional vectors

  • DashVector cluster specification: P.large

  • Metric: cosine

  • TopK: 100

Quantization policy

Index size ratio

QPS

Recall rate

None

100%

495.6

99.05%

DT_VECTOR_INT8

33.33%

733.8 (+48%)

94.67%

Note
  1. As indicated in the comparison, the index rate is reduced by two thirds and the QPS is increased by 48% at the cost of a recall rate decrease of 4.38%.

  2. The results are measured based on a Cohere dataset and are for reference only. The actual results vary with the data distribution method of datasets.

References

Dataset

Quantization policy

Index size ratio

Recall rate

QPS increase

Cohere 10M 768 Cosine

DT_VECTOR_INT8

33%

95.28%

170%

GIST 1M 960 L2

DT_VECTOR_INT8

35%

99.54%

134%

OpenAI 5M 1536 Cosine

DT_VECTOR_INT8

34%

67.34%

189%

Deep1B 10M 96 Cosine

DT_VECTOR_INT8

52%

99.97%

135%

Internal dataset 8M 512 Cosine

DT_VECTOR_INT8

38%

99.92%

152%

Important

The preceding table indicates that the DashVector quantization policy is not applicable to all datasets. You must be cautious about using the quantization policy in your production environments.

We recommend that you create two collections, one with and the other without the quantization policy, and determine whether to use the policy in production environments after careful comparison, testing, and verification.