All Products
Search
Document Center

Lindorm:Migrate data from a self-managed HDFS cluster to LindormDFS

Last Updated:Mar 28, 2026

Use the Apache Hadoop distributed copy (DistCp) tool to migrate full or incremental data from a self-managed Hadoop Distributed File System (HDFS) cluster to LindormDFS (LDPS). For more information about DistCp, see the DistCp Guide.

Prerequisites

Before you begin, make sure you have:

Verify connectivity

Run the following command on the self-managed Hadoop cluster to verify that it can reach LindormDFS:

hadoop fs -ls hdfs://<instance-id>/

Replace <instance-id> with your Lindorm instance ID. If the command lists the files in LindormDFS, the cluster is connected and you can proceed with the migration.

Migrate data to LindormDFS

If the Elastic Compute Service (ECS) instance on which the self-managed Hadoop cluster is deployed and LindormDFS are in the same virtual private cloud (VPC), you can migrate data to LindormDFS over the VPC. Run the following DistCp command to copy data:

hadoop distcp -m 1000 -bandwidth 30 hdfs://oldcluster:8020/user/hive/warehouse hdfs://<instance-id>/user/hive/warehouse
ParameterDescription
-m 1000Number of parallel Map tasks. Increase this value to speed up migration on large clusters; decrease it to reduce load on the source cluster.
-bandwidth 30Bandwidth limit per Map task.
hdfs://oldcluster:8020/...Source path. Replace oldcluster with the IP address or domain name of a NameNode in the self-managed Hadoop cluster.
hdfs://<instance-id>/...Destination path. Replace <instance-id> with your Lindorm instance ID.

FAQ

How do I estimate migration time for large datasets?

Migration time depends on the total data size and the network throughput between the self-managed cluster and LindormDFS. Migrate a few representative directories first, measure the time, and extrapolate to estimate the full duration.

If you can only migrate during specific maintenance windows, split the source directory into smaller subdirectories and migrate them in sequence across multiple windows.

How do I handle client writes during full migration?

Stop all client writes to the self-managed cluster before starting a full migration. If stopping writes is not feasible, configure clients to write simultaneously to both the self-managed cluster and LindormDFS during the migration period. Once migration completes, update the client configuration to write only to LindormDFS.