All Products
Search
Document Center

Vector Retrieval Service for Milvus:Migrate data to Alibaba Cloud Milvus with an image tool

Last Updated:Apr 13, 2026

If your self-managed source Milvus instance is not publicly accessible, deploy a data migration tool container on your local machine or within an Alibaba Cloud VPC to securely synchronize your data to Alibaba Cloud Milvus. The process uses the taihao-executor container image, which supports batch migration of multiple collections while ensuring data consistency and reliability.

Limitations and configuration requirements

Pre-migration preparations (required)

  1. Operation status control

    Cluster type

    Requirement

    Description

    Source cluster

    Stop all data modification operations

    This includes write, delete, and update operations. Ensure the cluster is in a read-only state to prevent data inconsistencies during migration.

    Destination cluster

    Pause all data operations

    This includes query, write, delete, and update operations. Keep the cluster unavailable to avoid data conflicts with the migration.

  2. Version compatibility

    Requirement

    Specification

    Source cluster version

    Must be later than 2.3.6 (≥ v2.3.7)

    Destination cluster version

    Must be the same as or later than the source cluster version

Migration task limits

  1. Task management

    • Concurrency limit: Only one migration task can run at a time.

  2. Data scope

    • Database limit: Each migration task can migrate collections from only one database.

    • Collection limit: Each migration task supports a maximum of five collections.

    • Total data size: The total number of entities across all collections cannot exceed 500 million.

  3. Data state

    • Source instance requirement: The collections to be migrated must be in a loaded state.

    • Destination instance requirement: The destination instance must be empty and contain no existing entity data.

Network requirements

The container must have network access to both the source and target Alibaba Cloud Milvus instances. For optimal performance, deploy the container in the same VPC as the target instance.

Procedure

Step 1: Pull the migration image

docker pull registry.cn-hangzhou.aliyuncs.com/taihao-executor/taihao-executor:release_2.22.0-ali

Step 2: Start and enter the container

  1. Start the container in detached mode.

    docker run -d -it \
      --name milvus-migration \
      registry.cn-hangzhou.aliyuncs.com/taihao-executor/taihao-executor:release_2.22.0-ali \
      /bin/bash
  2. Find the container ID and enter the container.

    # Find the container
    docker ps
    
    # Enter the container (replace with your actual container ID)
    docker exec -it <container_id> bash

    Example:

    docker exec -it 55ac98f3b054 bash

Step 3: Create the configuration file migration.conf

Inside the container, create the configuration file:

vi migration.conf

Configuration template

env {
  parallelism = 1           # Concurrency level. We recommend setting this to 1 initially.
  job.mode = "BATCH"        # Batch processing mode.
}

source {
  Milvus {
    url = "http://<source-instance-endpoint>:19530"       # An internal endpoint is supported.
    token = "<username>:<password>"                 # Example: root:Test123456@
    database = "default"                    # The database to migrate from. Defaults to `default`.
    collections = ["col_a", "col_b"]        # A list of collections to migrate.
    batch_size = 10000                      # Entities to read per batch. Increase this for large collections.
  }
}

sink {
  Milvus {
    url = "http://<target-Alibaba-Cloud-Milvus-endpoint>:19530"
    token = "<target-instance-token>"
    database = "default"
    batch_size = 1000
    enable_auto_id = false                 # Set to false to preserve auto-generated IDs from the source. Otherwise, set to true.
  }
}

Notes

  • To prevent task failure, all collections for migration must be loaded into memory using the load() method.

  • To migrate all loaded collections, omit the collections parameter from the configuration file.

  • If the container and the target instance are in the same region, use an internal endpoint to improve transfer speed.


Step 4: Start the migration task

Option 1: Run in local mode (single machine)

nohup ./bin/seatunnel.sh --config ./migration.conf -m local > migration.log 2>&1 &
Customize memory parameters (Optional)

Edit the config/jvm_client_options file:

-Xms4g
-Xmx8g

Set the heap memory size based on your machine's resources to prevent Out-of-Memory (OOM) errors.

Option 2: Run in cluster mode (Recommended for high performance)

This mode is recommended for migrating large data volumes.

# Create a log directory
mkdir -p ./logs

# Start the cluster service
./bin/seatunnel-cluster.sh -d

# Submit the task
nohup ./bin/seatunnel.sh --config ./migration.conf > migration.log 2>&1 &

Step 5: Index and load collection (Optional)

After the migration, connect to the target cluster by using Attu or an SDK and perform the following steps for each target collection:

  1. Create an index.

    milvus_client = milvus.prepare_index_params()
    index_params.add_index(
            field_name="vector",  # Name of the vector field to be indexed
            index_type="HNSW",  # Type of the index to create
            index_name="vector_index",  # Name of the index to create
            metric_type="L2",  # Metric type used to measure similarity
            params={
                "M": 64,  # Maximum number of neighbors each node can connect to in the graph
                "efConstruction": 100  # Number of candidate neighbors considered for connection during index construction
            }  # Index building params
        )
    milvus_client.create_index("collectionName", index_params)
  2. Load the collection into memory.

    milvus_client.load_collection()

    You must create an index before loading the collection to enable accelerated search. The key parameters are described below:

    Parameter

    How to obtain

    url

    Log in to the Alibaba Cloud Milvus console. On the Security Configuration tab, view the public or internal endpoint. For better performance, use an internal endpoint.

    token

    The token format is username:password (e.g., root:YourPassword123@). Log in to the Alibaba Cloud Milvus console. On the Security Configuration tab, view the password that corresponds to the root account.

    database

    The default value is default. If you use the multi-database feature, find other database names by calling the list_databases() API.

    Full configuration example:

    env {
      parallelism = 1
      job.mode = "BATCH"
    }
    
    source {
      Milvus {
        url = "http://xx.xx.xx.xx:19530"
        token = "root:SourcePass123@"
        database = "default"
        collections = ["medium_articles"]
        batch_size = 10000
      }
    }
    
    sink {
      Milvus {
        url = "http://proxy-bj.vpc.milvus.aliyuncs.com:19530"
        token = "root:TargetPass123@"
        database = "default"
        batch_size = 10000
        enable_auto_id = false
      }
    }

FAQ

Q1: Why do I see a "Collection not loaded" error during migration?

A: Ensure all source collections are loaded into memory by using the .load() method.

Q2: Can I migrate only specific fields?

A: No. The current version only supports migrating entire collections. Filtering specific fields is not supported.

Q3: How can I monitor the migration progress?

A: You can monitor the migration in two ways: check the output in the migration.log file, or use Attu to observe the row count in the target collection.