All Products
Search
Document Center

Synchronize full and incremental data

Last Updated: Jul 22, 2021

This topic describes the scenarios, features, benefits, and limits of synchronizing full and incremental data, and the operations for creating data synchronization tasks.

Scenarios

  • Upgrade major versions. You can upgrade HBase from V1.x to V2.x.

  • Migrate data across regions. For example, migrate data from a data center in the China (Qingdao) region to a data center in the China (Beijing) region.

  • Upgrade cluster specifications. You can upgrade the cluster specifications from 4 cores and 8 GB to 8 cores and 16 GB.

  • Decouple business. You can migrate a portion of your business to a new cluster.

Features

  • Supports migration that does not cause service downtime between two of the following versions: HBase 094, HBase 098, HBase V1.x, HBase V2.x, and ApsaraDB for Lindorm.

  • Supports table schema migration, real-time data synchronization, and full data migration.

  • Supports full database migration, namespace migration, and table-level migration.

  • Allows you to rename a table when you migrate the table data.

  • Allows you to specify the time range, the rowkey range, and the columns when you migrate data.

  • Provides APIs and allows you to call API operations to create migration tasks.

Benefits

  • When data is being migrated, no service downtime is caused. In one task, LTS can migrate historical data and synchronize real-time incremental data.

  • When data is being migrated, LTS does not interact with the source HBase cluster. LTS reads data only from the Hadoop Distributed File System (HDFS) of the source cluster. This minimizes the impact on the online business that runs on the source cluster.

  • In most cases, compared with data migration at the API layer, data replication at the file layer helps you reduce more than 50% of the data usage and improves efficiency.

  • Each node can migrate data at a rate of up to 150 MB/s to meet stability requirements for data migration. You can add nodes for horizontal scaling to migrate terabytes or petabytes of data.

  • LTS implements a comprehensive retry mechanism when an error occurs. It monitors the task speed and the task progress in real time, and raises alerts when tasks fail.

  • LTS automatically synchronizes schemas to ensure consistency among partitions.

Limits

  • HBase clusters that have Kerberos enabled are not supported.

  • Single-node ApsaraDB for HBase instances are not supported.

  • ApsaraDB for HBase instances within the classic network are not supported due to network issues.

  • An asynchronous mode is used to synchronize incremental data based on write-ahead logging (WAL). Data that is imported through BulkLoad and data that is not written to WAL are not synchronized.

Precautions

  • Before you migrate data, make sure that the HDFS capacity of the destination cluster is sufficient. This helps you prevent the data from exhausting the capacity during migration.

  • Before you submit an incremental synchronization task, we recommend that you modify the log retention period for the source cluster. You must reserve time for LTS to handle incremental synchronization errors. For example, you can change the hbase.master.logcleaner.ttl setting of hbase-site.xml to a value greater than 12 hours and restart the HBase Master.

  • You do not need to manually create tables in the destination cluster. LTS will automatically create tables and regions in the same way as those in the source cluster. If you create a table to store the migrated data, the partitioning scheme of the table may be different from that of the source table. As a result, the table that you create may be frequently split or compressed after the migration process is complete. If the table stores a large amount of data, the entire process can take a long time.

  • If the source table has a coprocessor, ensure that the destination cluster contains the corresponding JAR package of the coprocessor when you create a destination table.

Before you begin

  1. Check the network connectivity among the source cluster, the destination cluster, and Lindorm Tunnel Service (LTS).

  2. Add the HBase and ApsaraDB for Lindorm data sources.

  3. Log on to the LTS web UI.

Create a task

  1. In the left-side navigation pane, choose Migration > Quick Migration.

  2. Click create new job.

  • job name: The task name can contain only letters and digits. This parameter is optional. By default, the task ID is used as the task name.

  • Configure the Source Cluster and Target Cluster as prompted.

  • Select operations based on your requirements.

    • migration table schema: creates tables in the destination cluster. These tables have the same schema and partition information as the source tables. If a table already exists in the destination cluster, the data in this table is not migrated.

    • real time data replication: synchronizes incremental data from the source cluster in real time.

    • history data migration: physically migrates all the files at a file level.

  • table mapping: Enter table names. Each table name occupies a line.

  • advanced configuration: This parameter can be left blank.

View a task

  1. In the left-side navigation pane, choose Migration > Quick Migration to view tasks.

View the details of a task

  1. In the left-side navigation pane, choose Migration > Quick Migration.

  2. On the page that appears, click the name of the task that you want to view. View the execution status of the task.

Perform a switchover

  1. Wait until the full migration task is completed and the latency of incremental synchronization is as low as several seconds or hundreds of milliseconds.

  2. Enable LTS data sampling and verification. When you sample and verify large tables, ensure that the sampling ratio is appropriate to prevent the online business from being affected.

  3. Verify your business.

  4. Perform a switchover on your business.