To prevent a zone-level failure from making the entire Vector Retrieval Service for Milvus (Milvus) service unavailable, Milvus provides two cross-zone deployment modes: Multi-zone Deployment (basic) and Multi-zone Deployment (HA). This topic describes the architecture of these two deployment modes and outlines their differences.
Background
In a single-zone Milvus instance, all services are deployed in a single zone. In extreme cases, such as a data center failure, a single availability zone (AZ) can become unavailable, which causes the entire Milvus service to become unavailable.
For Multi-zone Deployment (basic) and Multi-zone Deployment (HA) instances, the metadata service, Message Service, and data are distributed across multiple data centers. This design ensures data integrity and metadata service availability even if a zone fails. The overall service availability depends on the compute and storage resources in the remaining active zones.
The main differences between Multi-zone Deployment (basic) and Multi-zone Deployment (HA) are as follows:
Multi-zone Deployment (basic): This edition has one set of compute resources and is more cost-effective. It requires some time to recover from a failure, with a recovery time objective (RTO) of less than 1 hour.
Multi-zone Deployment (HA): This edition has two sets of compute resources. If a service failure occurs, the system fails over to the secondary cluster. The RTO is less than 3 minutes.
Deployment architectures
Multi-zone Deployment (basic)

The architecture of the Multi-zone Deployment (basic) edition ensures high availability for data and services in the following ways:
Cross-data center deployment of the metadata service: All metadata nodes are deployed across three data centers to ensure high availability for metadata.
Cross-data center deployment of the Message Service: All Message Service nodes are deployed across three data centers to ensure high availability for message data.
Single data center deployment of compute nodes: Compute nodes are deployed in the primary zone by default to minimize latency between services. If the primary zone fails, compute nodes are relaunched in the secondary zone.
Upgraded OSS deployment mode: OSS is upgraded to use zone-redundant storage. This ensures stability for cold storage and high availability for data.
Multi-zone Deployment (HA)

The architecture of the Multi-zone Deployment (HA) edition ensures high availability for data and services in the following ways:
Cross-data center deployment of the metadata service: All metadata nodes are deployed across three data centers to ensure high availability for metadata.
Cross-data center deployment of the Message Service: All Message Service nodes are deployed across three data centers to ensure high availability for message data.
Dual-zone deployment of compute nodes: Compute nodes are deployed in the primary zone by default to minimize service latency. Backup compute nodes are deployed in the secondary zone and synchronously load data. If the primary zone fails, the vector retrieval service for Milvus backend switches the primary and secondary zones, and the secondary zone continues to provide read and write services.
Upgraded OSS deployment mode: OSS is upgraded to use zone-redundant storage. This ensures stability for cold storage and high availability for data.
High availability comparison
The following table compares single-zone instances, Multi-zone Deployment (basic) instances, Multi-zone Deployment (HA) instances, and cross-region HA instances of vector retrieval service for Milvus in terms of availability, performance, and other characteristics. Select a deployment mode based on your business requirements.
Item | Single-zone | Multi-zone (basic) | Multi-zone (HA) | Cross-region (HA) |
Number of AZs for compute nodes | 1 | 2 | 2 | Regions >= 2, AZs >= 3 |
RPO and RTO during a data center failure | No data center-level disaster recovery capability |
|
|
|
SLA | 99.9% | 99.9% | 99.95% | 99.99% |
Cost | 1 |
|
|
|
Performance | 1 |
|
|
|
Limitations
Multi-zone Deployment (HA) requires one set of compute nodes in each of the primary and secondary zones. Therefore, the number of nodes must be a multiple of 2 when you create or scale out an instance.
Multi-zone Deployment (HA) does not support the 2.4 Milvus version.