All Products
Search
Document Center

Synchronize full and incremental data

Last Updated: Feb 02, 2021

This topic describes the scenarios, features, benefits, and limits of synchronizing full and incremental data, and the operations for creating data synchronization tasks.

Scenarios

  • Upgrade major versions. You can upgrade HBase from V1.x to V2.x.

  • Migrate data across regions. For example, migrate data from a data center in the China (Qingdao) region to a data center in the China (Beijing) region.

  • Upgrade cluster specifications. You can upgrade the cluster specifications from 4-core 8 GB to 8-core 16 GB.

  • Decouple business. You can migrate a portion of your business to a new cluster.

Features

  • Supports migration that does not cause service downtime between two of the following versions: HBase 094, HBase 098, HBase V1.x, HBase V2.x, and Lindorm.

  • Supports table schema migration, real-time data synchronization, and full data migration.

  • Supports full database migration, namespace migration, and table-level migration.

  • Allows you to rename a table when you migrate the table data.

  • Allows you to specify the time range, the rowkey range, and the columns when you migrate data.

  • Provides APIs and allows you to call API operations to create migration tasks.

Migration benefits

  • When data is being migrated, no service downtime is caused. In one task, ApsaraDB for HBase can migrate historical data and synchronize incremental data in real time.

  • When data is being migrated, ApsaraDB for HBase does not interact with source HBase clusters. ApsaraDB for HBase only reads data from the Hadoop Distributed File System (HDFS) of the source clusters. This minimizes the impact on the online business that runs on the source clusters.

  • In most cases, data migration at the file layer outperforms data migration at the API layer in that it reduces more than 50% of the traffic to improve efficiency.

  • Each node can migrate data at a rate of up to 150 Mbit/s to meet stability requirements on data migration. You can add nodes to migrate terabytes or petabytes of data.

  • ApsaraDB for HBase implements a comprehensive retry mechanism when an error occurs. It monitors the task speed and the task progress in real time, and raises alerts when tasks fail.

  • ApsaraDB for HBase automatically synchronizes schemas to ensure consistency among partitions.

Limits

  • You cannot enable Kerberos clusters.

  • Single-node ApsaraDB for HBase instances are not supported.

  • ApsaraDB for HBase instances within the classic network are not supported due to network issues.

  • ApsaraDB for HBase implements an asynchronous mode to synchronize incremental data based on write-ahead logging (WAL). Data that is imported through BulkLoad and data that is not written to WAL are not synchronized.

Notes

  • Before you migrate data, make sure that the HDFS capacity of the destination cluster is sufficient. This helps you prevent the data from exhausting the capacity during migration.

  • Before you submit an incremental synchronization task, we recommend that you modify the log retention period for the source cluster. You must reserve time for ApsaraDB for HBase to handle incremental synchronization errors. For example, you can change the hbase.master.logcleaner.ttl setting of hbase-site.xml to a value greater than 12 hours and restart the HBase Master.

  • You do not need to create tables in the destination cluster. The synchronization service of Lindorm Tunnel Service (LTS) automatically creates tables that have the same data and partition information as those in the source cluster. If you create a table to store the migrated data, the partitioning scheme of the table may be different from that of the source table. As a result, the table that you create may be frequently split or compressed after the migration process is complete. If the table stores a large amount of data, the entire process can take a long time.

  • If the source table has a coprocessor, ensure that the destination cluster contains the corresponding JAR package of the coprocessor when you create a destination table.

Before you begin

  1. Check the network connectivity among the source cluster, the destination cluster, and the Lindorm Tunnel Service (LTS) cluster.

  2. Add the HBase and Lindorm data sources.

  3. Log on to the LTS web UI.

Create a task

  1. In the left-side navigation pane,Choose Migration > Quick Migration. Quick Migration

  2. click create new job.Creat Job

  • The task name can contain only letters and digits. This parameter is optional. By default, the task ID is used as the task name.

  • Select a source cluster and a destination cluster. If the specified cluster does not exist, create a cluster as prompted.

  • Select operations based on your requirements.

    • Migration table schema: creates tables in the destination cluster. These tables have the same schema and partition information as the source tables. If a table already exists in the destination cluster, the data in this table is not migrated.

    • Real time data replication: synchronizes incremental data from the source cluster in real time.

    • History data migration: physically migrates all files at a file level.

  • Enter table names, one on each line.

  • You can leave Advanced Settings empty.

View a task

In the left-side navigation pane,Choose Migration > Quick Migration.

View task details

  1. In the left-side navigation pane,Choose Migration > Quick Migration.

  2. On the page that appears, click the name of the task that you want to view.View the execution status of the task.

Perform a switchover

  1. Wait until the full migration task is completed and the latency of incremental synchronization is as low as several seconds or hundreds of milliseconds.

  2. Enable sampling and verification on LTS data. When you sample and verify large tables, ensure that the sampling ratio is appropriate to avoid impact on online business.

  3. Verify your business.

  4. Perform a switchover on your business.