All Products
Search
Document Center

ApsaraDB for HBase:Migrate full and incremental data across Phoenix clusters

Last Updated:Feb 02, 2021

This topic describes the scenarios, features, benefits, and limits of migrating full and incremental data across Phoenix clusters and shows you how to create data migration tasks.

Scenarios

  • Migrate data across regions. For example, migrate data from a data center in the China (Qingdao) region to a data center in the China (Beijing) region.

  • Upgrade cluster specifications. You can upgrade the cluster specifications from 4-core 8 GB to 8-core 16 GB.

  • Split business. You can migrate a portion of your business to a new cluster.

Features

  • Supports only migration across Phoenix clusters of the same major version. You cannot migrate data between a cluster that runs Phoenix 4.x to a cluster that runs Phoenix 5.x.

  • Supports table schema migration, real-time data synchronization, and full data migration.

  • Supports full database migration, namespace migration, and table-level migration.

  • Allows you to rename a table when you migrate the table data.

  • Allows you to specify the time range, the rowkey range, and the columns when you migrate data.

  • Provides APIs and allows you to call API operations to create migration tasks.

Migration benefits

  • When data is being migrated, no service downtime is caused. In one task, ApsaraDB for HBase can migrate historical data and synchronize real-time incremental data.

  • When data is being migrated, ApsaraDB for HBase does not interact with the source HBase cluster. ApsaraDB for HBase only reads data from the Hadoop Distributed File System (HDFS) of the source cluster. This minimizes the impact on the online business that runs on the source cluster.

  • In most cases, compared with data migration at the API layer, data replication at the file layer helps you reduce more than 50% of the data usage and improves efficiency.

  • Each node can migrate data at a rate of up to 150 MB/s to meet stability requirements for data migration. You can add nodes for horizontal scaling to migrate terabytes or petabytes of data.

  • ApsaraDB for HBase implements a comprehensive retry mechanism when an error occurs. It monitors the task speed and the task progress in real time, and raises alerts when tasks fail.

  • ApsaraDB for HBase automatically synchronizes schemas to ensure consistency among partitions.

Limits

  • HBase clusters that have Kerberos enabled are not supported.

  • Single-node ApsaraDB for HBase instances are not supported.

  • ApsaraDB for HBase instances within the classic network are not supported due to network issues.

  • ApsaraDB for HBase implements an asynchronous mode to synchronize incremental data based on write-ahead logging (WAL). Data that is imported through BulkLoad and data that is not written to WAL are not synchronized.

Notes

  • Before you migrate data, make sure that the HDFS capacity of the destination cluster is sufficient. This helps you prevent the data from exhausting the capacity during migration.

  • Before you submit an incremental synchronization task, we recommend that you modify the log retention period for the source cluster. You must reserve time for ApsaraDB for HBase to handle incremental synchronization errors. For example, you can change the hbase.master.logcleaner.ttl setting of hbase-site.xml to a value greater than 12 hours and restart the HBase Master.

  • You do not need to create tables in the destination cluster. Lindorm Tunnel Service (LTS) automatically creates tables that have the same data and partition information as those in the source cluster. If you create a table to store the migrated data, the partitioning scheme of the table may be different from that of the source table. As a result, the table that you create may be frequently split or compressed after the migration process is complete. If the table stores a large amount of data, the entire process can take a long time.

  • If the source table has a coprocessor, ensure that the destination cluster contains the corresponding JAR package of the coprocessor when you create a destination table.

Before you begin

  1. Check the network connectivity among the source cluster, the destination cluster, and the LTS service.

  2. Add a Phoenix data source

  3. Log on to the LTS web UI.

Create a task

  1. In the left-side navigation pane,Choose Migration > Phoenix -> Phoenix.10

  2. click create new job.6

  • The task name can contain only letters and digits. This parameter is optional. By default, the task ID is used as the task name.

  • Select a source cluster and a destination cluster. If the specified cluster does not exist, create a cluster as prompted.

  • Select operations based on your requirements.

    • Migration table schema: creates tables in the destination cluster. These tables have the same schema and partition information as the source tables. If a table already exists in the destination cluster, the data in this table is not migrated.

    • Real time data replication: synchronizes incremental data from the source cluster in real time.

    • History data migration: physically migrates all files at a file level.

  • Enter table names, one on each line. For more information, click explanation.

  • You can leave the advanced configuration field empty.

View task details

  1. In the left-side navigation pane,Choose Migration > Phoenix -> Phoenix.

  2. On the page that appears, click the name of the task that you want to view.

2

View the execution status of the task.3

Perform a switchover

  1. Wait until the full migration task is completed and the latency of incremental synchronization is as low as several seconds or hundreds of milliseconds.

  2. Enable LTS data sampling and verification. When you sample and verify large tables, ensure that the sampling ratio is appropriate to avoid the impact on the online business.

  3. Verify your business.

  4. Perform a switchover on your business.