Synchronize or migrate data from HBase clusters to Lindorm - Lindorm

You can use Lindorm Tunnel Service (LTS) to migrate existing data and synchronize real-time data from a self-managed HBase cluster or ApsaraDB for HBase cluster to LindormTable. This topic describes how to migrate and synchronize data from HBase clusters.

Scenarios

Data migration from self-managed HBase clusters to Lindorm.
Data migration across regions. For example, you can migrate data from a data center in the China (Qingdao) region to a data center in the China (Beijing) region.
Workload distribution. For example, you can migrate some of your business to a new cluster.

Features and benefits

Supported features

Data can be migrated from HBase V1.x and V2.x clusters to Lindorm without business interruption.
Table schema migration, real-time data synchronization, and full data migration are supported.
Migration based on databases, namespaces, and tables is supported.
Table renaming during migration is supported.
Time ranges, row key ranges, and columns can be specified during migration.
An API operation is supported to create migration tasks.

Benefits

Your business is not interrupted during migration. Both the historical data and real-time incremental data can be migration during one migration task.
When data is being migrated, the destination self-managed HBase cluster does not interact with the source HBase cluster. The destination HBase cluster reads data only from the HDFS of the source cluster. This minimizes the impact on the online business that runs on the source cluster.
In most cases, compared with data migration at the API layer, data replication at the file layer helps you reduce more than 50% of the data usage and improves efficiency.
Each node can migrate data at a rate of up to 150 MB/s to meet stability requirements for data migration. You can add nodes for horizontal scaling to migrate terabytes or petabytes of data.
LTS implements a robust retry mechanism to respond to errors. LTS monitors the task speed and the task progress in real time, and generates alerts when tasks fail.
LTS automatically synchronizes schemas to ensure consistency among partitions.

Limits

Data cannot be migrated to self-managed HBase clusters.
HBase clusters for which Kerberos is enabled are not supported.
Single-node ApsaraDB for HBase clusters are not supported.
ApsaraDB for HBase clusters that are deployed in the classic network are not supported due to network compatibility issues.
Data cannot be migrated or synchronized to a Lindorm standalone instance.
LTS implements an asynchronous mode to synchronize incremental data based on write-ahead logging (WAL). Data that is imported by using bulk loading and data that is not written to WAL files is not synchronized.
Search indexes cannot be migrated.

Usage notes

Before you migrate data, make sure that the HDFS capacity of the destination cluster is sufficient. This helps you prevent the data from exhausting the capacity during migration.
Before you submit an incremental synchronization task, we recommend that you set the log retention period for the source cluster to a period longer than 12 hours to reserve time for LTS to handle incremental synchronization errors. You can modify the value of the hbase.master.logcleaner.ttl parameter in the hbase-site.xml configuration file to set the log retention period for the source cluster. After you modify the parameter value, you must restart the source cluster. The unit of the hbase.master.logcleaner.ttl parameter is ms. For example, if you set the hbase.master.logcleaner.ttl parameter to 43200000, the specified log retention period is 12 hours.
You do not need to manually create tables in the destination cluster. LTS will automatically create tables and regions in the same way as those in the source cluster. The partitioning scheme of a manually created table may be different from that of the source table. As a result, the manually created table may be frequently split or compacted after the migration process is complete. If the table stores large amounts of data, the entire process may take a long time.
If the source table has a coprocessor, make sure that the destination cluster contains the corresponding JAR file of the coprocessor when you create a destination table.
If log data is not consumed after you enable the incremental synchronization feature, the log data is retained for 48 hours by default. After the period expires, the subscription is automatically canceled and the retained data is automatically deleted.

Prerequisites

The networks among the source cluster, the destination cluster, and LTS are connected.
The HBase and Lindorm data sources are added. For more information, see Add an HBase data source and Add a LindormTable data source.

Create a task

Log on to LTS. For more information, see Log on to the web UI of LTS.
In the left-side navigation pane, choose Migration > Quick Migration.
Configure parameters and then click create.
In the job name(optional) field, enter the name of the task. The task name can contain only letters and digits. You can leave the task name unspecified. In this case, the task ID is used as the task name.
Configure the Source Cluster and Target Cluster parameters as prompted.
Select the operations that you want to perform.
- migration table schema: creates tables in the destination cluster. These tables have the same schema and partition information as the source tables. If a table already exists in the destination cluster, the data in this table is not migrated.
- real time data replication: synchronizes incremental data from the source cluster in real time.
- history data migration: physically migrates all the files.
Enter required information in the table mapping and advance configuration fields. The advance configuration field can be left unspecified.
Click create.

View the details of a task

In the left-side navigation pane, choose Migration > Quick Migration to view the task that you created.
On the page that appears, click the name of the task that you want to view. On the page that appears, view the execution status of the task.

Perform a switchover

Wait until the full migration task is complete. The latency of incremental synchronization is as low as several seconds or hundreds of milliseconds.
Enable LTS data sampling and verification. When you sample and verify large tables, make sure that the sampling ratio is appropriate to prevent the online business from being affected.
Verify your business.
Perform a switchover on your business.

FAQ

Why does the data in the task fail to be consumed?

The LTS cluster is released while the task is still running, the synchronization task is suspended, or the task is abnormally blocked.

What do I do if the data migration task fails?

The data migration task may fail due to factors such as unstable network connections or service conflicts. LTS comes with the retry mechanism, which allows the system to automatically retry the task after a failure. If the issue persists, contact Lindorm technical support (DingTalk ID: s0s3eg3).