Adolph
Engineer
Engineer
  • UID623
  • Fans2
  • Follows1
  • Posts72
Reads:1011Replies:0

HBase data migration - cross-network migration from physical server room clusters to EMR clusters

Created#
More Posted time:Oct 24, 2016 17:23 PM
During the design of data migration schemes, the following steps are followed if the HBase and Hadoop versions are inconsistent between the two clusters:
1. The HBase of the old cluster should be stopped before you start the migration or stop collecting data in HBase to ensure data consistency after migration.
2. The two clusters must both be configured with internet IP addresses, and every server of the new cluster should configure hosts for all the internet IP addresses and host names of the old cluster.
3. Execute hadoop distcp  -skipcrccheck -update -i  -m 200 <src>  <target> on the new cluster. For specific meanings of the command or parameters, please refer to the official documents. The command I used for the migration is: hadoop distcp  -skipcrccheck -update -i -m 200  hftp://xxx.xxx.xxx.xxx:50070/hbase/table name     hdfs://xxx.xxx.xxx.xxx:9000/hbase/data/default/table name. The path can be found by viewing the storage configuration of HBase.
4. After the migration is completed, execute the following repairing commands in the new cluster. In general, the first three commands are enough. If not, execute the fourth command. For the sake of convenience, I wrote batch migration scripts for the repairs. Execute all commands and the following status will appear: If OK is displayed, it means the operation is successful.
hbase hbck -fixTableOrphans  Table name
hbase hbck -fixMeta  Table name
hbase hbck -fixAssignments  Table name
hbase hbck -repair  Table name


For migration of a sizable amount of data, the bandwidth should be adjusted (the bandwidth of the old cluster can be increased as appropriate, and that of the new cluster should be enough for communication) to improve the migration speed. In the meantime, attention should be paid to the data usage charges. After all, it is internet transmission, and intranet migration can be ignored. The various problems emerging during the migration are definitely a challenge. Hope your migration goes smoothly.
Guest