Migrate to Vector Retrieval Service for Milvus using an image tool - Vector Retrieval Service for Milvus

If your source Milvus instance is self-managed and inaccessible over the public network, you can securely migrate data to Alibaba Cloud Milvus by deploying a data migration tool container either locally or within an Alibaba Cloud Virtual Private Cloud (VPC). This process uses the taihao-executor container image, which supports batch migration for multiple collections while ensuring data consistency and high reliability.

Restrictions and configuration requirements

Pre-migration preparations (required)

Operation status control

Cluster type	Requirement	Description
Source cluster	Stop all data modification operations	This includes write, delete, and update operations. Ensure the cluster is in a read-only state to prevent data inconsistencies during migration.
Destination cluster	Pause all data operations	This includes query, write, delete, and update operations. Keep the cluster unavailable to avoid data conflicts with the migration.

Version compatibility
Requirement
Specification
Source cluster version
Must be later than 2.3.6 (≥ v2.3.7)
Destination cluster version
Must be the same as or later than the source cluster version

Migration task limits

Task management
- Concurrency limit: Only one migration task can run at a time.
Data scope
- Database limit: Each migration task can migrate collections from only one database.
- Collection limit: Each migration task supports a maximum of five collections.
- Total data size: The total number of entities across all collections cannot exceed 500 million.
Data state
- Source instance requirement: The collections to be migrated must be in a loaded state.
- Destination instance requirement: The destination instance must be empty and contain no existing entity data.

Network requirements

The network where the container is deployed must provide access to both the source Milvus instance and the destination Alibaba Cloud Milvus instance. For optimal performance, deploy the container in the same VPC as the destination instance.

Steps

Step 1: Pull the VTS image

docker pull registry.cn-hangzhou.aliyuncs.com/taihao-executor/taihao-executor:release_2.22.0-ali

Step 2: Start the container and enter its environment

Start the container in the background.

docker run -d -it \
  --name milvus-migration \
  registry.cn-hangzhou.aliyuncs.com/taihao-executor/taihao-executor:release_2.22.0-ali \
  /bin/bash

View the container ID and access the container.

# Query for the container
docker ps

# Enter the container (replace with your actual container ID)
docker exec -it <container_id> bash

Example:

docker exec -it 55ac98f3b054 bash

Step 3: Create the `migration.conf` configuration file

Create the configuration file inside the container:

vi migration.conf

Configuration template

hoconenv {
  parallelism = 1           # Concurrency. Set the initial value to 1.
  job.mode = "BATCH"        # Batch mode.
}

source {
  Milvus {
    url = "http://<source_instance_address>:19530"       # An internal network address is supported.
    token = "<username>:<password>"                 # Example: root:Test123456@
    database = "default"                    # The default is "default". You can run list_databases to query for other databases.
    collections = ["col_a", "col_b"]        # A list of collections to migrate.
    batch_size = 10000                      # The number of entries to read at a time. You can increase this value for large tables.
  }
}

sink {
  Milvus {
    url = "http://<destination_alibaba_cloud_milvus_address>:19530"
    token = "<destination_instance_token>"
    database = "default"
    batch_size = 1000
    enable_auto_id = false                 # If the source collection has auto-incrementing IDs, set this to false. Otherwise, set it to true.
  }
}

Notes

Load the source collections: You must load all collections that you want to migrate using the load() method. Otherwise, an error occurs.
To migrate all collections: Delete the collections line from the configuration file to automatically synchronize all loaded collections.
Use an internal network address: If the container and the destination instance are in the same region, use the internal network endpoint of the destination instance to improve the data transfer speed.

Step 4: Start the migration task

Method 1: Local mode (single-machine operation)

nohup ./bin/seatunnel.sh --config ./migration.conf -m local > migration.log 2>&1 &

Customize memory parameters (Optional)

Edit the config/jvm_client_options file:

-Xms4g
-Xmx8g

Set the heap memory size based on your machine's available resources

Method 2: Cluster mode (Recommended for high performance)

Suitable for migrating large data volumes:

# Create a log directory
mkdir -p ./logs

# Start the cluster service
./bin/seatunnel-cluster.sh -d

# Submit the task
nohup ./bin/seatunnel.sh --config ./migration.conf > migration.log 2>&1 &

Step 5: Build and load indexes on the destination instance (Optional)

After the migration is complete, log on to Attu or use an SDK to perform the following operations on the destination collections:

Create an index.

milvus_client = milvus.prepare_index_params()
index_params.add_index(
        field_name="vector",  # Name of the vector field to be indexed
        index_type="HNSW",  # Type of the index to create
        index_name="vector_index",  # Name of the index to create
        metric_type="L2",  # Metric type used to measure similarity
        params={
            "M": 64,  # Maximum number of neighbors each node can connect to in the graph
            "efConstruction": 100  # Number of candidate neighbors considered for connection during index construction
        }  # Index building params
    )
milvus_client.create_index("collectionName", index_params)

Load the collection into memory.

milvus_client.load_collection()

Create the index before you load the collection. Otherwise, accelerated retrieval cannot be enabled. Key parameters:

Parameter	How to obtain
url	Log on to the Alibaba Cloud Milvus console. On the Security Configuration tab, view the public or internal network address. We recommend that you use the internal network address for better performance.
token	The format is `username:password`, for example, `root:YourPassword123@`. Log on to the Alibaba Cloud Milvus console. On the Security Configuration tab, view the password for the root account.
database	The default is `default`. If you use the multi-database feature, you can query for other databases using the `list_databases()` API.

Complete configuration:

env {
  parallelism = 1
  job.mode = "BATCH"
}

source {
  Milvus {
    url = "http://xx.xx.xx.xx:19530"
    token = "root:SourcePass123@"
    database = "default"
    collections = ["medium_articles"]
    batch_size = 10000
  }
}

sink {
  Milvus {
    url = "http://proxy-bj.vpc.milvus.aliyuncs.com:19530"
    token = "root:TargetPass123@"
    database = "default"
    batch_size = 10000
    enable_auto_id = false
  }
}

FAQ

Q1: The error "Collection not loaded" occurs during migration. What should I do?

A: Ensure that all source collections for migration are loaded into memory using the .load() method.

Q2: Can I migrate only specific fields?

A: No, you cannot. The current version supports migrating only entire collections. Field filtering is not supported.

Q3: How can I monitor the migration progress?

A: You can check the output in the migration.log file. You can also monitor the change in the number of rows in the destination collection using Attu.

Requirement	Specification
Source cluster version	Must be later than 2.3.6 (≥ v2.3.7)
Destination cluster version	Must be the same as or later than the source cluster version