ApsaraDB for HBase provides Lindorm Tunnel Service (LTS) for data migration and real-time data synchronization between ApsaraDB for HBase instances of various versions. LTS also allows you to synchronize real-time data from ApsaraDB RDS or LogHub to ApsaraDB for HBase. DataX is an offline data synchronization tool that is widely used in the Alibaba Group. DataX synchronizes data between various heterogeneous data sources such as MySQL, Oracle, SQL Server, PostgreSQL, Hadoop Distributed File System (HDFS), Hive, AnalyticDB for MySQL, HBase, TableStore (OTS), MaxCompute (ODPS), and PolarDB-X.

Prerequisites

The IP address of the ECS instance in which DataX is deployed is added to a whitelist. This way, the ApsaraDB for HBase Performance-enhanced Edition instance can be accessed. If the ECS instance and the ApsaraDB for HBase Performance-enhanced Edition instance are not in the same virtual private cloud (VPC), you must use the public endpoint.

Use DataX to synchronize data

You can use one of the following methods to configure DataX synchronization tasks:
  • Use the Data Integration service provided by Alibaba Cloud DataWorks to configure synchronization tasks in DataX.
  • Use the open source DataX to configure synchronization tasks.

Use DataWorks to configure the parameters of DataX.

  1. Create a workspace. For more information, see Create a workspace.
  2. Create a resource group.

    We recommend that you use the resources from an exclusive resource group or a custom resource group to connect to an ApsaraDB for HBase instance to synchronize data.

    Resource group type Reference Characteristic Remarks
    Exclusive resource groups Exclusive resource group mode DataWorks automatically subscribes to and manages exclusive resources. This ensures high service performance and availability. Exclusive resources cannot be shared among regions. For example, the exclusive resources in the China (Shanghai) region can be used by the workspace only in the China (Shanghai) region. The resources cannot be allocated to virtual private clouds (VPCs) in other regions. You can use resources from an exclusive resource group to connect only to ApsaraDB for HBase instances that are attached to the same vSwitch.
    Custom resource groups Create a custom resource group for Data Integration Only DataWorks Enterprise Edition or a more advanced edition supports custom resource groups. ECS instances that belong to a custom resource group can be purchased based on your business requirements. You can deploy ECS instances in the VPC of the ApsaraDB for HBase instance to access the ApsaraDB for HBase instance over the internal network. If the ECS instances and the ApsaraDB for HBase instance are not deployed in the same VPC, you can access the ApsaraDB for HBase instance only over the Internet. You have all permissions on instances that belong to the custom resource group. If you want to use DataX to log on to or manage the ECS instances, you must install, manage, maintain, and upgrade DataX based on your business requirements. For more information, see Custom resource group.
    Default resource group None You can use ECS instances that belong to the default resource group to access the ApsaraDB for HBase instance in the VPC over the Internet. You cannot use the ECS instances to access the ApsaraDB for HBase instance over the internal network. If you use DatawWorks to access the ApsaraDB for HBase instance over the Internet, excess costs are incurred.
  3. Configure the network for the exclusive resource group or the custom resource group.
    • Configure the network for the exclusive resource group.
      1. Bind the exclusive resource group to the VPC in which the ApsaraDB for HBase instance is deployed. For more information, see Exclusive resource group mode.
      2. In the VPC console, check the CIDR block of the VPC and vSwitch to which the exclusive resource group is bound. The CIDR block must be added to the whitelist of your ApsaraDB for HBase instance because the exact IP addresses of ECS instances in the exclusive resource group is unavailable. This way, you can connect to the ApsaraDB for HBase instance.
      3. Add the CIDR block to the whitelist of your ApsaraDB for HBase instance. For more information, see Configure a whitelist.
    • Configure the network for the custom resource group

      The exact IP address of each ECS instance in the custom resource group is available because the instances are purchased. You can add all the IP addresses to the whitelist of your ApsaraDB for HBase instance. For more information, see Configure a whitelist.

    • Configure the network for the default resource group

      View the CIDR block of the instances in the default resource group. For more information, see Configure a whitelist. Add the CIDR block of the region to the whitelist of your ApsaraDB for HBase instance.

  4. Create a synchronization task and bind a resource group
    1. Create a synchronization task. For more information, see Configure a synchronization node by using the codeless UI.
    2. Modify the configuration of HBase Writer and HBase Reader. The HBase Reader plug-in is used to read data from ApsaraDB for HBase instances. The HBase Writer plug-in is used to write data to ApsaraDB for HBase instances. For more information about the HBase Reader and HBase Writer plug-ins, see HBase Writer and HBase Reader.
      For more information about how to configure the plug-ins, see the help documentation for the plug-ins. You must configure the endpoint parameter instead of the Zookeeper.quorum parameter when you configure the hbaseConfig file in an ApsaraDB for HBase Performance-enhanced Edition instance. The following sample code provides an example on how to configure the connection parameters.
      "hbaseConfig": {
        "hbase.client.connection.impl" : "com.alibaba.hbase.client.AliHBaseUEConnection",
        "hbase.client.endpoint" : "host:30020",
        "hbase.client.username" : "root",
        "hbase.client.password" : "root"
      }
      Note
      • The value of the hbase.client.connection.impl parameter is fixed.
      • The hbase.client.endpoint parameter specifies the Java API provided in the ApsaraDB for HBase console. You can use the Java API to access an ApsaraDB for HBase Performance-enhanced Edition instance. For more information about how to obtain the Java API, see Connect to an instance.
      • The hbase.client.username and hbase.client.password parameters specify the user-defined username and password. Make sure that the account has the read and write permissions on ApsaraDB for HBase Performance-enhanced Edition tables. The default username and password are root. This account has the read and write permissions on all tables.
      • Select ApsaraDB for HBase V1.1.x as the ApsaraDB for HBase version.

Use the open source DataX to configure synchronization tasks

  1. Download the DataX installation package.

    Download the DataX installation package that integrates the JAR file that is required to access the ApsaraDB for HBase Performance-enhanced Edition instance. Extract the JAR file from the downloaded TAR file.

    If DataX is installed or DataX of the latest version is downloaded from GitHub, you must add the required JAR file by performing the following steps:

    Download the JAR file of alihbase-connector-1.x from the Plug-ins used to connect to ApsaraDB for HBase Performance-enhanced Edition section in Install ApsaraDB for HBase SDK for Java. You do not need to download the entire file. The 1.x version indicates the version of the alihbase-connector plug-in. The plug-in of the latest version is downloaded. Save the JAR file in the datax/plugin/writer/hbase11xwriter/libs directory. If you want to use DataX to read data from ApsaraDB for HBase Performance-enhanced Edition instances, save the JAR file in datax/plugin/reader/hbase11xreader/libs directory.

  2. Edit the configuration file.

    In DataX, the hbase11xreader plug-in is used to read data from ApsaraDB for HBase Performance-enhanced Edition instances. For more information about how to configure the plug-in, see Configuration examples. The hbase11xwriter plug-in is used to write data to ApsaraDB for HBase Performance-enhanced Edition instances. For more information, see Configuration examples. Only the hbaseConfig configuration for read and write operations on ApsaraDB for HBase Performance-enhanced Edition instances is different from that of the open source HBase. Other configurations for the read and write operations are the same as those of the open source HBase. You must configure the endpoint parameter instead of the Zookeeper.quorum parameter when you configure hbaseConfig for an ApsaraDB for HBase Performance-enhanced Edition instance. The following sample code provides an example on how to configure the connection parameters.

    ...
    "hbaseConfig": {
                                "hbase.client.connection.impl" : "com.alibaba.hbase.client.AliHBaseUEConnection",
                                "hbase.client.endpoint" : "host:30020",
                                "hbase.client.username" : "root",
                                "hbase.client.password" : "root"
                            }
    ...
                

    You can set the value of the hbase.client.connection.impl parameter to com.alibaba.hbase.client.AliHBaseUEConnection to use the connection configuration of ApsaraDB for HBase Performance-enhanced Edition. The hbase.client.endpoint parameter specifies the Java API provided in the ApsaraDB for HBase console. You can use the Java API to access an ApsaraDB for HBase Performance-enhanced Edition instance. For more information about how to obtain the Java API, see Use the ApsaraDB for HBase Java API to access an ApsaraDB for HBase Performance-enhanced Edition instance. The hbase.client.username and hbase.client.password parameters specify the user-defined username and password. Make sure that the account has the read and write permissions on ApsaraDB for HBase Performance-enhanced Edition tables. The default username are root. This account has the read and write permissions on all tables.

  3. Use DataX to migrate data. For more information about how to use DataX, see Official DataX documentation.

Additional considerations

Before you migrate data, you must add the IP address of the ECS instance in which DataX is deployed to the whitelist. For more information, see Use the ApsaraDB for HBase Java API to access an ApsaraDB for HBase Performance-enhanced Edition instance. If the ECS instance and the ApsaraDB for HBase Performance-enhanced Edition instance·are not in the same VPC, you must use the public endpoint to access the ApsaraDB for HBase Performance-enhanced Edition instance.