All Products
Search
Document Center

Vector Retrieval Service for Milvus:Migrate to Vector Retrieval Service for Milvus using an image tool

Last Updated:Jan 23, 2026

If your source Milvus instance is self-managed and inaccessible over the public network, you can securely migrate data to Alibaba Cloud Milvus by deploying a data migration tool container either locally or within an Alibaba Cloud Virtual Private Cloud (VPC). This process uses the taihao-executor container image, which supports batch migration for multiple collections while ensuring data consistency and high reliability.

Restrictions and configuration requirements

Pre-migration preparations (required)

  1. Operation status control

    Cluster type

    Requirement

    Description

    Source cluster

    Stop all data modification operations

    This includes write, delete, and update operations. Ensure the cluster is in a read-only state to prevent data inconsistencies during migration.

    Destination cluster

    Pause all data operations

    This includes query, write, delete, and update operations. Keep the cluster unavailable to avoid data conflicts with the migration.

  2. Version compatibility

    Requirement

    Specification

    Source cluster version

    Must be later than 2.3.6 (≥ v2.3.7)

    Destination cluster version

    Must be the same as or later than the source cluster version

Migration task limits

  1. Task management

    • Concurrency limit: Only one migration task can run at a time.

  2. Data scope

    • Database limit: Each migration task can migrate collections from only one database.

    • Collection limit: Each migration task supports a maximum of five collections.

    • Total data size: The total number of entities across all collections cannot exceed 500 million.

  3. Data state

    • Source instance requirement: The collections to be migrated must be in a loaded state.

    • Destination instance requirement: The destination instance must be empty and contain no existing entity data.

Network requirements

The network where the container is deployed must provide access to both the source Milvus instance and the destination Alibaba Cloud Milvus instance. For optimal performance, deploy the container in the same VPC as the destination instance.

Steps

Step 1: Pull the VTS image

docker pull registry.cn-hangzhou.aliyuncs.com/taihao-executor/taihao-executor:release_2.22.0-ali

Step 2: Start the container and enter its environment

  1. Start the container in the background.

    docker run -d -it \
      --name milvus-migration \
      registry.cn-hangzhou.aliyuncs.com/taihao-executor/taihao-executor:release_2.22.0-ali \
      /bin/bash
  2. View the container ID and access the container.

    # Query for the container
    docker ps
    
    # Enter the container (replace with your actual container ID)
    docker exec -it <container_id> bash

    Example:

    docker exec -it 55ac98f3b054 bash

Step 3: Create the migration.conf configuration file

Create the configuration file inside the container:

vi migration.conf

Configuration template

hoconenv {
  parallelism = 1           # Concurrency. Set the initial value to 1.
  job.mode = "BATCH"        # Batch mode.
}

source {
  Milvus {
    url = "http://<source_instance_address>:19530"       # An internal network address is supported.
    token = "<username>:<password>"                 # Example: root:Test123456@
    database = "default"                    # The default is "default". You can run list_databases to query for other databases.
    collections = ["col_a", "col_b"]        # A list of collections to migrate.
    batch_size = 10000                      # The number of entries to read at a time. You can increase this value for large tables.
  }
}

sink {
  Milvus {
    url = "http://<destination_alibaba_cloud_milvus_address>:19530"
    token = "<destination_instance_token>"
    database = "default"
    batch_size = 1000
    enable_auto_id = false                 # If the source collection has auto-incrementing IDs, set this to false. Otherwise, set it to true.
  }
}

Notes

  • Load the source collections: You must load all collections that you want to migrate using the load() method. Otherwise, an error occurs.

  • To migrate all collections: Delete the collections line from the configuration file to automatically synchronize all loaded collections.

  • Use an internal network address: If the container and the destination instance are in the same region, use the internal network endpoint of the destination instance to improve the data transfer speed.


Step 4: Start the migration task

Method 1: Local mode (single-machine operation)

nohup ./bin/seatunnel.sh --config ./migration.conf -m local > migration.log 2>&1 &
Customize memory parameters (Optional)

Edit the config/jvm_client_options file:

-Xms4g
-Xmx8g

Set the heap memory size based on your machine's available resources

Method 2: Cluster mode (Recommended for high performance)

Suitable for migrating large data volumes:

# Create a log directory
mkdir -p ./logs

# Start the cluster service
./bin/seatunnel-cluster.sh -d

# Submit the task
nohup ./bin/seatunnel.sh --config ./migration.conf > migration.log 2>&1 &

Step 5: Build and load indexes on the destination instance (Optional)

After the migration is complete, log on to Attu or use an SDK to perform the following operations on the destination collections:

  1. Create an index.

    milvus_client = milvus.prepare_index_params()
    index_params.add_index(
            field_name="vector",  # Name of the vector field to be indexed
            index_type="HNSW",  # Type of the index to create
            index_name="vector_index",  # Name of the index to create
            metric_type="L2",  # Metric type used to measure similarity
            params={
                "M": 64,  # Maximum number of neighbors each node can connect to in the graph
                "efConstruction": 100  # Number of candidate neighbors considered for connection during index construction
            }  # Index building params
        )
    milvus_client.create_index("collectionName", index_params)
  2. Load the collection into memory.

    milvus_client.load_collection()

    Create the index before you load the collection. Otherwise, accelerated retrieval cannot be enabled. Key parameters:

    Parameter

    How to obtain

    url

    Log on to the Alibaba Cloud Milvus console. On the Security Configuration tab, view the public or internal network address. We recommend that you use the internal network address for better performance.

    token

    The format is username:password, for example, root:YourPassword123@. Log on to the Alibaba Cloud Milvus console. On the Security Configuration tab, view the password for the root account.

    database

    The default is `default`. If you use the multi-database feature, you can query for other databases using the list_databases() API.

    Complete configuration:

    env {
      parallelism = 1
      job.mode = "BATCH"
    }
    
    source {
      Milvus {
        url = "http://xx.xx.xx.xx:19530"
        token = "root:SourcePass123@"
        database = "default"
        collections = ["medium_articles"]
        batch_size = 10000
      }
    }
    
    sink {
      Milvus {
        url = "http://proxy-bj.vpc.milvus.aliyuncs.com:19530"
        token = "root:TargetPass123@"
        database = "default"
        batch_size = 10000
        enable_auto_id = false
      }
    }

FAQ

Q1: The error "Collection not loaded" occurs during migration. What should I do?

A: Ensure that all source collections for migration are loaded into memory using the .load() method.

Q2: Can I migrate only specific fields?

A: No, you cannot. The current version supports migrating only entire collections. Field filtering is not supported.

Q3: How can I monitor the migration progress?

A: You can check the output in the migration.log file. You can also monitor the change in the number of rows in the destination collection using Attu.