All Products
Search
Document Center

Vector Retrieval Service for Milvus:Milvus resource estimation and configuration recommendations

Last Updated:Jun 20, 2026

Before using Vector Retrieval Service for Milvus (Milvus), you must estimate your compute resource needs to ensure system stability and control costs. Milvus provides a resource calculator to help you estimate the required resources based on experimental data (such as vector count, vector dimension, and index type). However, you should adjust the configuration for your actual deployment based on your test results.

Resource calculator

Note

Milvus maintains a CPU-to-memory ratio of 1:4. To ensure instance stability, the service provisions more memory than required.

Use the resource calculator to estimate the required instance resources. Enter the vector count and dimension, select an index type, and the system recommends a suitable instance configuration. For example, for a dataset of 8 million vectors with 768 dimensions each, using the HNSW index type with the index parameter M set to 4, the calculator displays a recommended configuration in real time. You can use these recommendations as a starting point for instance selection and performance testing. The resource calculator supports the following input parameters: the High Availability toggle, Vector Count (Millions), Vector Dimension, Index Type (such as HNSW), Index Parameter M (number of neighbors per node, range 4–64), and the Scalar Field toggle. For example, for 8 million 768-dimension vectors using an HNSW index (M=4), the calculator estimates 31.2 GB of required memory and 22.9 GB for raw data, recommending a cluster configuration of 40 CPU cores and 160 GB of memory. The recommended configuration for each Milvus component is: metadata service (1 instance of 4 cores, 16 GB), Proxy (1 instance of 2 cores, 8 GB), Query Node (2 instances of 4 cores, 16 GB), Index Node (1 instance of 4 cores, 16 GB), and Data Node (1 instance of 2 cores, 8 GB). These estimates are based on experimental data. We recommend you adjust the configuration based on your own test results before deploying to production.

Resource parameters

High availability configuration

The high availability configuration stabilizes your online cluster through a dual-node replica mechanism. It also loads two replicas of your data by default, which significantly improves system fault tolerance and reliability. Note that enabling high availability doubles the resource requirements of a non-HA configuration. Before enabling this feature, thoroughly assess and plan your production cluster's resource needs.

Input resource scale

Parameter

Description

Vector Count (Millions)

The vector count directly determines the index size and the amount of data to be scanned during queries. A larger vector count requires more storage, increases the computational complexity of index building and queries, and results in longer processing times and greater hardware demands.

Vector Dimension

The vector dimension affects the index complexity and precision. Higher-dimensional vectors increase index complexity and the computational cost of similarity searches. This impacts both storage costs and query speed, especially without effective dimensionality reduction or quantization strategies.

Index type

The index type is a key factor that determines resource requirements and query performance. Different index algorithms have different requirements for memory, CPU, and query time. The supported index types are described below.

Parameter

Description

HNSW

HNSW (Hierarchical Navigable Small World) is a graph-based index that provides high query efficiency, especially in high-dimensional data spaces. However, it demands more computing resources and memory. It is suitable for scenarios that require the fastest query speed, have sufficient resources, and involve processing high-dimensional data.

IVF_FLAT

Provides a balance between precision and query speed, making it suitable for most scenarios. It reduces computational complexity through clustering, offering a significant performance improvement over the FLAT index with relatively low resource consumption. It is ideal for scenarios with large datasets that require a balance between query performance and resource costs.

DISKANN

DISKANN is a disk-based approximate nearest neighbor (ANN) search technique designed for fast and efficient retrieval on large-scale datasets. It uses the Vamana graph algorithm to achieve efficient vector indexing and retrieval even with limited memory, making it suitable for processing ultra-large-scale, high-dimensional data.

SCANN

SCANN is an efficient approximate nearest neighbor (ANN) search index structure suitable for large-scale datasets that require fast retrieval but not high precision. It balances speed and resource consumption by using search space pruning and quantization, with a focus on maximum inner product search (MIPS).

FLAT

Provides the highest query precision by performing an exact match, but this sacrifices query speed. Performance may be unsatisfactory on large-scale datasets. It is suitable for scenarios with relatively small data volumes (e.g., tens of millions) that have strict requirements for query precision and can tolerate slower query speeds.

IVF_SQ8

Accelerates the retrieval process through quantization, making it suitable for resource-constrained scenarios that require high recall. However, its query precision may be lower compared to HNSW. It is ideal for applications on large-scale datasets where resources are limited and a high recall rate is required.

Index parameters

  • HNSW: You need to set the M value, which determines the number of neighbors for each node. A larger M value increases the index's recall and precision but also increases the index build time and memory usage. A smaller M value builds the index faster with less memory, but may sacrifice some precision. A recommended initial value for M is lg(N), where N is the total number of vectors. You can then fine-tune this value based on actual query performance. For example, try setting M to 16, 32, or 64 and adjust based on the results.

  • IVF_FLAT and IVF_SQ8: You need to set the number of clusters (inverted lists) to partition the vector space. A larger number of clusters increases the index's precision and recall but also increases the index build time and query computation cost. A smaller number of clusters reduces index complexity but may lead to a drop in precision.

  • SCANN: You can use the with_raw_data parameter to control whether to store the raw data within the index. If your system is primarily used for fast, approximate searches and does not require frequent access to raw data, we recommend setting this to False. Otherwise, set it to True.

Scalar fields

When you enable the Scalar Fields option, you can set the Average Size of Data per Row parameter. This helps the indexing system allocate memory, storage, and sharding resources efficiently to optimize query performance.