All Products
Search
Document Center

Vector Retrieval Service for Milvus:Comparison of multi-zone basic and HA editions

Last Updated:May 21, 2026

To prevent a zone-level failure from making the entire Vector Retrieval Service for Milvus (Milvus)  service unavailable, Milvus provides two cross-zone deployment modes: Multi-zone Deployment (basic) and Multi-zone Deployment (HA). This topic describes the architecture of these two deployment modes and outlines their differences.

Background

In a single-zone Milvus instance, all services are deployed in a single zone. In extreme cases, such as a data center failure, a single availability zone (AZ) can become unavailable, which causes the entire Milvus service to become unavailable.

For Multi-zone Deployment (basic) and Multi-zone Deployment (HA) instances, the metadata service, Message Service, and data are distributed across multiple data centers. This design ensures data integrity and metadata service availability even if a zone fails. The overall service availability depends on the compute and storage resources in the remaining active zones.

The main differences between Multi-zone Deployment (basic) and Multi-zone Deployment (HA) are as follows:

  • Multi-zone Deployment (basic): This edition has one set of compute resources and is more cost-effective. It requires some time to recover from a failure, with a recovery time objective (RTO) of less than 1 hour.

  • Multi-zone Deployment (HA): This edition has two sets of compute resources. If a service failure occurs, the system fails over to the secondary cluster. The RTO is less than 3 minutes.

Deployment architectures

Multi-zone Deployment (basic)

image.png

The architecture of the Multi-zone Deployment (basic) edition ensures high availability for data and services in the following ways:

  • Cross-data center deployment of the metadata service: All metadata nodes are deployed across three data centers to ensure high availability for metadata.

  • Cross-data center deployment of the Message Service: All Message Service nodes are deployed across three data centers to ensure high availability for message data.

  • Single data center deployment of compute nodes: Compute nodes are deployed in the primary zone by default to minimize latency between services. If the primary zone fails, compute nodes are relaunched in the secondary zone.

  • Upgraded OSS deployment mode: OSS is upgraded to use zone-redundant storage. This ensures stability for cold storage and high availability for data.

Multi-zone Deployment (HA)

image.png

The architecture of the Multi-zone Deployment (HA) edition ensures high availability for data and services in the following ways:

  • Cross-data center deployment of the metadata service: All metadata nodes are deployed across three data centers to ensure high availability for metadata.

  • Cross-data center deployment of the Message Service: All Message Service nodes are deployed across three data centers to ensure high availability for message data.

  • Dual-zone deployment of compute nodes: Compute nodes are deployed in the primary zone by default to minimize service latency. Backup compute nodes are deployed in the secondary zone and synchronously load data. If the primary zone fails, the vector retrieval service for Milvus backend switches the primary and secondary zones, and the secondary zone continues to provide read and write services.

  • Upgraded OSS deployment mode: OSS is upgraded to use zone-redundant storage. This ensures stability for cold storage and high availability for data.

High availability comparison

The following table compares single-zone instances, Multi-zone Deployment (basic) instances, Multi-zone Deployment (HA) instances, and cross-region HA instances of vector retrieval service for Milvus in terms of availability, performance, and other characteristics. Select a deployment mode based on your business requirements.

Item

Single-zone

Multi-zone (basic)

Multi-zone (HA)

Cross-region (HA)

Number of AZs for compute nodes

1

2

2

Regions >= 2, AZs >= 3

RPO and RTO during a data center failure

No data center-level disaster recovery capability

  • RPO = 0

  • RTO < 1 hour (with sufficient resource inventory)

  • RPO = 0

  • RTO < 3 minutes

  • RPO < 10 seconds

  • RTO < 3 minutes

SLA

99.9%

99.9%

99.95%

99.99%

Cost

1

  • 1 to 1.2x

  • Storage: +20%

  • Compute: No change

  • 2x

  • Storage: +20%

  • Compute: +100% (strict active-standby)

  • 3x or more

  • Storage: +100% or more

  • Compute: +200% or more

  • Network: Additional cross-region synchronization traffic fees

Performance

1

  • Metadata service latency may slightly increase due to cross-data center communication.

  • Query latency and insert latency remain largely unaffected.

  • Metadata service latency may slightly increase due to cross-data center communication.

  • Query latency and insert latency remain largely unaffected.

  • Normal read and write performance remains largely unaffected.

  • Cross-region synchronization has second-level replication latency.

  • Access latency after failover depends on the network distance to the secondary region.

  • Compute performance after failover is largely consistent with the same specifications.

Limitations

  • Multi-zone Deployment (HA) requires one set of compute nodes in each of the primary and secondary zones. Therefore, the number of nodes must be a multiple of 2 when you create or scale out an instance.

  • Multi-zone Deployment (HA) does not support the 2.4 Milvus version.