Before you use Vector Retrieval Service for Milvus, it's essential to estimate your compute resource requirements. This helps ensure system stability and optimize costs. Milvus provides a resource calculator to help you estimate the required resources based on factors like dataset size, vector dimensions, and index types. However, you should always adjust the final configuration based on your own performance testing.
Resource calculator
The service maintains a vCPU-to-memory ratio of 1:4. For example, an instance with 2 vCPUs comes with 8 GiB of memory. To ensure instance stability, the service provisions more memory resources than required.
You can use the resource calculator to get a baseline estimate for your instance resources. Enter your number of vectors (in million) and vector dimensions, and select an index type. The system will recommend an appropriate instance configuration. For example, for a dataset with 8 million 768-dimension vectors using the HNSW index type (with M=4), the resource calculator will display a recommended configuration, as shown in the figure below. You can use this recommendation as a starting point for instance selection and performance testing.
Configure resource parameters
Enable high-availability
The high-availability (HA) configuration ensures the stability of production clusters by maintaining two replicas of each node. This significantly improves system fault tolerance and reliability.
Enabling HA doubles the resource requirements compared to a non-HA configuration. Fully assess your resource needs before enabling this feature for a production cluster.
Specify the resource scale
Parameter | Description |
Vector Count (Millions) | The total number of vectors in your dataset. A larger dataset requires more storage and increases the computational complexity of index building and queries. This can lead to longer processing times and higher hardware demands. |
Vector Dimension | The number of dimensions in each vector. Higher-dimensional vectors increase index complexity and the computational cost of similarity searches. This impacts both storage costs and query speed, especially without effective dimensionality reduction or quantization strategies. |
Select an index type
The index type is a key factor that determines resource requirements and query performance. Different index algorithms have different trade-offs between query speed, recall, and resource consumption.
Index type | Description & use cases |
HNSW | A graph-based index that offers ultrafast query responses, especially for high-dimensional data. Best for: Use cases that require the fastest possible query speed where sufficient memory and compute resources are available. |
IVF_FLAT | Provides high search performance and accuracy. It offers significantly better performance than FLAT with lower resource consumption. Best for: Large datasets that require a good balance between performance and cost. |
DISKANN | A disk-based ANN index designed for efficient retrieval on massive datasets that do not fit in memory. It uses the Vamana graph algorithm. Best for: Extremely large, high-dimensional datasets where memory is a constraint. |
SCANN | An efficient ANN index that uses search space pruning and quantization. It excels at maximum inner-product search (MIPS). Best for: Large datasets where retrieval speed is more critical than precision. |
FLAT | Performs an exact, brute-force search. Best for: Smaller datasets (such as under a few million vectors) where perfect precision is mandatory and slower query speeds are acceptable. |
IVF_SQ8 | Uses quantization to accelerate retrieval. Its precision may be lower than HNSW. Best for: Large datasets where resources are limited but a high recall rate is required. |
Set index parameters
HNSW: Configure the
Mparameter, which defines the number of neighbors for each node.A larger
Mvalue increases recall and precision, but also increases index build time and memory usage.A smaller
Mvalue results in a faster build and lower memory footprint, potentially at the cost of some precision.A recommended starting value for
Mislg(N), whereNis the total number of vectors. You can then fine-tune this value (such as 16, 32, or 64) based on your query performance.
IVF_FLAT and IVF_SQ8: Set the number of vectors in each cluster. This determines the number of clusters to partition the vector space.
SCANN: Use the with_raw_data switch to control whether to store raw data in the index. If your system is mainly used for fast approximate searches and does not frequently access raw data, set this switch to False.
Set scalar fields
If you enable Scalar Fields, set Average Size of Data per Row. This parameter helps the index system plan memory, storage, and sharding, and optimize query performance.