All Products
Search
Document Center

Vector Retrieval Service for Milvus:Quick start: Create a Milvus instance

Last Updated:Jun 18, 2026

This topic describes how to create a Vector Retrieval Service for Milvus (Milvus) instance.

Usage notes

Vector Retrieval Service for Milvus (Milvus) is available in Standard Edition and Basic Edition:

  • The Standard Edition is a distributed vector retrieval solution designed for enterprise-grade applications and large-scale production environments. Deployed as a multi-availability zone cluster, it provides a production-level Service Level Agreement (SLA) for high availability and supports independent horizontal scaling of compute and storage resources. This edition is ideal for production scenarios that require high reliability, high concurrency, and large-scale data processing.

  • The Basic Edition is a lightweight vector retrieval solution designed for individual developers and small teams. Deployed as a single process, it does not support horizontal scaling. This edition is suitable for development, learning, feature validation, and initial testing, but is not recommended for production environments.

Feature

Basic Edition

Standard Edition

Primary use case

Development, testing, and feature validation

Production environments and large-scale applications

Deployment architecture

Single-process deployment in a single availability zone

Distributed cluster with support for multi-availability zone high availability

Service Level Agreement (SLA)

Does not include a production-level availability guarantee.

Provides a production-level high availability SLA

Scalability

Does not support horizontal scaling; only vertical scaling is supported

Supports independent horizontal scaling of compute and storage resources

Instance upgrade

Cannot be directly upgraded to a Standard Edition cluster; data migration is required

Supports seamless scaling within the cluster

Prerequisites

  • You have an Alibaba Cloud account. If you do not have an account, you must first register one. For more information, see the documentation on registering an Alibaba Cloud account.

  • When creating an instance for the first time, you must grant Milvus permissions to access other cloud resources. For more information, see Alibaba Cloud account role authorization.

  • If you are using a RAM user, you must grant the required permissions. For more information, see RAM user authorization.

Procedure

  1. Go to the Vector Retrieval Service for Milvus page.

    1. Log on to the Vector Retrieval Service for Milvus console.

    2. In the navigation pane on the left, click Instances.

  2. On the Instances page, click Create Instance and configure the following parameters.

    Parameter

    Description

    Billing Method

    Both subscription and pay-as-you-go billing methods are supported.

    Duration

    For the subscription billing method, the default duration is one month. The available subscription durations are displayed on the page.

    Region

    The physical location of the instance.

    Important

    Choose the region carefully. The region cannot be changed after the instance is created.

    Network

    • A Virtual Private Cloud (VPC) is an isolated network environment that you create in Alibaba Cloud. You have full control over your VPC.

      Select an existing VPC. To create a new one, click Go to console to create. For more information, see Create and manage a VPC.

    • A vSwitch is a basic network module of a VPC that connects different cloud resources.

      Select an existing vSwitch. To create a new one, click Create in console. For more information, see Create and manage a vSwitch.

    Deployment Model

    • Single-zone: Suitable for development and test environments. This option offers lower costs and simpler deployment but does not provide cross-zone disaster recovery.

    • Multi-zone Deployment (Basic): Contains a single set of compute resources. In case of a failure, the recovery time objective (RTO) is within one hour. This option is highly cost-effective.

    • Multi-zone Deployment (HA): Contains two sets of compute resources. In case of a service disruption, the system can switch to the standby cluster, with an RTO of less than three minutes.

    For more information about the differences, use cases, and architectures of the Multi-availability zone (Basic) and High Availability options, see Comparison of Multi-availability Zone Basic and High Availability options.

    Version

    Versions 2.4, 2.5, and 2.6 are supported.

    Instance Series

    • The Standard Edition is a distributed vector retrieval solution designed for enterprise-grade applications and large-scale production environments. Deployed as a multi-availability zone cluster, it provides a production-level Service Level Agreement (SLA) for high availability and supports independent horizontal scaling of compute and storage resources. This edition is ideal for production scenarios that require high reliability, high concurrency, and large-scale data processing.

    • The Basic Edition is a lightweight vector retrieval solution designed for individual developers and small teams. Deployed as a single process, it does not support horizontal scaling. This edition is suitable for development, learning, feature validation, and initial testing, but is not recommended for production environments.

    Service Node

    This parameter is required when you create a Standard Edition instance.

    Service nodes handle client requests and manage the cluster status. They distribute query requests to the appropriate compute nodes, collect the results, and return them to the user. They also maintain the cluster's metadata to ensure that requests are correctly routed to the corresponding compute nodes.

    • Streaming Node: Responsible for real-time data writes and incremental data consumption.

      Recommendation: To support frequent writes and low-latency searches, increase the number of nodes or upgrade their specifications to improve real-time processing.

    • DataNode: Responsible for data writing, persistence, and management.

      Recommendation: For large data volumes and frequent imports, we strongly recommend increasing the number of nodes or upgrading their specifications to improve overall write bandwidth and stability.

    • Proxy: Responsible for receiving and routing client requests.

      Recommendation: If you have a high number of client connections and high-concurrency requests, we strongly recommend increasing the number of nodes or upgrading their specifications to improve request handling and forwarding capacity.

    • Metadata Service: Responsible for resource scheduling and task coordination.

      Recommendation: For large-scale clusters with many data partitions, we strongly recommend upgrading the specifications to ensure stable scheduling.

    Compute Node

    QueryNode: Responsible for vector search and filtering.

    Action Required: If memory usage exceeds 70%, you must increase the number of nodes or upgrade their specifications to ensure query performance and stability. For more node information, see Compute node specifications and performance comparison.

    Data Replicas

    Enabling multiple data replicas significantly improves cluster availability. We recommend enabling this feature for production workloads.

    Data Storage

    By default, locally redundant storage is used. You are billed hourly for the storage space (in GB) you use.

    Automatic Backup

    Important

    Using the backup feature incurs storage costs. For more information, see Billing items.

    The automatic backup feature is enabled by default. This feature ensures the data security of your instance and helps guarantee the Service Level Agreement (SLA). If data is accidentally lost, you can use this feature to recover it.

    Note

    To disable this feature, go to the Backup Snapshot tab after the instance is created and turn it off. For more information, see Backup and recovery.

    Password

    Use this password to log on to the database. Set the password for the root (administrator) account of the Milvus instance.

    Note

    If you forget the password, see FAQ.

    OSS Data Encryption

    OSS data encryption requires Key Management Service (KMS). Go to the Key Management Service console to enable it.

    Resource Group

    Select an existing resource group. To create a new one, click Create Resource Group. Resource groups allow you to group your cloud resources based on dimensions such as purpose, permissions, and ownership. For more information, see What is a resource group?.

    Tag

    Add tags during or after instance creation to help identify and manage your resources. For more information about tags, see What is a tag?.

  3. Confirm the configurations, accept the terms of service, and click Create Instance.

    The instance is ready once its status changes to Running.