If your self-managed source Milvus instance is not publicly accessible, deploy a data migration tool container on your local machine or within an Alibaba Cloud VPC to securely synchronize your data to Alibaba Cloud Milvus. The process uses the taihao-executor container image, which supports batch migration of multiple collections while ensuring data consistency and reliability.
Limitations and configuration requirements
Pre-migration preparations (required)
Operation status control
Cluster type
Requirement
Description
Source cluster
Stop all data modification operations
This includes write, delete, and update operations. Ensure the cluster is in a read-only state to prevent data inconsistencies during migration.
Destination cluster
Pause all data operations
This includes query, write, delete, and update operations. Keep the cluster unavailable to avoid data conflicts with the migration.
Version compatibility
Requirement
Specification
Source cluster version
Must be later than 2.3.6 (≥ v2.3.7)
Destination cluster version
Must be the same as or later than the source cluster version
Migration task limits
Task management
Concurrency limit: Only one migration task can run at a time.
Data scope
Database limit: Each migration task can migrate collections from only one database.
Collection limit: Each migration task supports a maximum of five collections.
Total data size: The total number of entities across all collections cannot exceed 500 million.
Data state
Source instance requirement: The collections to be migrated must be in a loaded state.
Destination instance requirement: The destination instance must be empty and contain no existing entity data.
Network requirements
The container must have network access to both the source and target Alibaba Cloud Milvus instances. For optimal performance, deploy the container in the same VPC as the target instance.
Procedure
Step 1: Pull the migration image
docker pull registry.cn-hangzhou.aliyuncs.com/taihao-executor/taihao-executor:release_2.22.0-aliStep 2: Start and enter the container
Start the container in detached mode.
docker run -d -it \ --name milvus-migration \ registry.cn-hangzhou.aliyuncs.com/taihao-executor/taihao-executor:release_2.22.0-ali \ /bin/bashFind the container ID and enter the container.
# Find the container docker ps # Enter the container (replace with your actual container ID) docker exec -it <container_id> bashExample:
docker exec -it 55ac98f3b054 bash
Step 3: Create the configuration file migration.conf
Inside the container, create the configuration file:
vi migration.confConfiguration template
env {
parallelism = 1 # Concurrency level. We recommend setting this to 1 initially.
job.mode = "BATCH" # Batch processing mode.
}
source {
Milvus {
url = "http://<source-instance-endpoint>:19530" # An internal endpoint is supported.
token = "<username>:<password>" # Example: root:Test123456@
database = "default" # The database to migrate from. Defaults to `default`.
collections = ["col_a", "col_b"] # A list of collections to migrate.
batch_size = 10000 # Entities to read per batch. Increase this for large collections.
}
}
sink {
Milvus {
url = "http://<target-Alibaba-Cloud-Milvus-endpoint>:19530"
token = "<target-instance-token>"
database = "default"
batch_size = 1000
enable_auto_id = false # Set to false to preserve auto-generated IDs from the source. Otherwise, set to true.
}
}Notes
To prevent task failure, all collections for migration must be loaded into memory using the
load()method.To migrate all loaded collections, omit the
collectionsparameter from the configuration file.If the container and the target instance are in the same region, use an internal endpoint to improve transfer speed.
Step 4: Start the migration task
Option 1: Run in local mode (single machine)
nohup ./bin/seatunnel.sh --config ./migration.conf -m local > migration.log 2>&1 &Customize memory parameters (Optional)
Edit the config/jvm_client_options file:
-Xms4g
-Xmx8gSet the heap memory size based on your machine's resources to prevent Out-of-Memory (OOM) errors.
Option 2: Run in cluster mode (Recommended for high performance)
This mode is recommended for migrating large data volumes.
# Create a log directory
mkdir -p ./logs
# Start the cluster service
./bin/seatunnel-cluster.sh -d
# Submit the task
nohup ./bin/seatunnel.sh --config ./migration.conf > migration.log 2>&1 &Step 5: Index and load collection (Optional)
After the migration, connect to the target cluster by using Attu or an SDK and perform the following steps for each target collection:
Create an index.
milvus_client = milvus.prepare_index_params() index_params.add_index( field_name="vector", # Name of the vector field to be indexed index_type="HNSW", # Type of the index to create index_name="vector_index", # Name of the index to create metric_type="L2", # Metric type used to measure similarity params={ "M": 64, # Maximum number of neighbors each node can connect to in the graph "efConstruction": 100 # Number of candidate neighbors considered for connection during index construction } # Index building params ) milvus_client.create_index("collectionName", index_params)Load the collection into memory.
milvus_client.load_collection()You must create an index before loading the collection to enable accelerated search. The key parameters are described below:
Parameter
How to obtain
url
Log in to the Alibaba Cloud Milvus console. On the Security Configuration tab, view the public or internal endpoint. For better performance, use an internal endpoint.
token
The token format is
username:password(e.g.,root:YourPassword123@). Log in to the Alibaba Cloud Milvus console. On the Security Configuration tab, view the password that corresponds to the root account.database
The default value is
default. If you use the multi-database feature, find other database names by calling thelist_databases()API.Full configuration example:
env { parallelism = 1 job.mode = "BATCH" } source { Milvus { url = "http://xx.xx.xx.xx:19530" token = "root:SourcePass123@" database = "default" collections = ["medium_articles"] batch_size = 10000 } } sink { Milvus { url = "http://proxy-bj.vpc.milvus.aliyuncs.com:19530" token = "root:TargetPass123@" database = "default" batch_size = 10000 enable_auto_id = false } }
FAQ
Q1: Why do I see a "Collection not loaded" error during migration?
A: Ensure all source collections are loaded into memory by using the .load() method.
Q2: Can I migrate only specific fields?
A: No. The current version only supports migrating entire collections. Filtering specific fields is not supported.
Q3: How can I monitor the migration progress?
A: You can monitor the migration in two ways: check the output in the migration.log file, or use Attu to observe the row count in the target collection.