ApsaraDB for HBase Performance-enhanced Edition provides Lindorm Tunnel Service (LTS) for data migration and real-time data synchronization between ApsaraDB for HBase clusters of various versions. LTS also allows you to synchronize real-time data from ApsaraDB RDS or LogHub to ApsaraDB for HBase. DataX is an offline data synchronization tool for moving data between heterogeneous sources — including MySQL, Oracle, SQL Server, PostgreSQL, Hadoop Distributed File System (HDFS), Hive, AnalyticDB for MySQL, HBase, Tablestore, MaxCompute, and Distributed Relational Database Service (DRDS). Use DataX when you need to bulk-import data from these sources into ApsaraDB for HBase Performance-enhanced Edition.
To run DataX against ApsaraDB for HBase Performance-enhanced Edition, choose one of the following approaches:
-
DataWorks Data Integration (recommended) — a managed service that provisions and operates the DataX runtime for you.
-
Open source DataX — download and run DataX directly on a self-managed machine.
Prerequisites
Before you begin, ensure that you have:
-
An ApsaraDB for HBase cluster running version 2.4.3 or later. To check or upgrade your cluster version, see Minor version updates
-
The client IP address added to the cluster whitelist. See Configure a whitelist and a security group
Usage notes
-
ApsaraDB for HBase clusters are accessible only over a virtual private cloud (VPC). If you need Internet access, update the SDK before importing data. See Use ApsaraDB for HBase SDK for Java to replace an open source HBase version with an ApsaraDB for HBase version.
-
If your application runs on an Elastic Compute Service (ECS) instance and accesses the cluster over a VPC, the ECS instance and the cluster must meet the following conditions:
-
Both are deployed in the same region. Deploying them in the same zone reduces network latency.
-
Both belong to the same VPC.
-
Use DataWorks to configure a synchronization task
Step 1: Create a workspace
Create a workspace in DataWorks. See Create a workspace.
Step 2: Choose and configure a resource group
DataWorks runs DataX jobs inside a resource group. The table below describes the three types and their trade-offs.
| Resource group type | How it accesses the cluster | Key constraints |
|---|---|---|
| Exclusive resource group (recommended) | Associates directly with the cluster's VPC and vSwitch — no Internet traffic | Resources are region-scoped; cannot be shared across regions. You can use resources from an exclusive resource group to access ApsaraDB for HBase clusters that are attached to the same vSwitch. |
| Custom resource group | Deploys ECS instances inside the cluster's VPC (same-VPC access). If the ECS instances and the ApsaraDB for HBase cluster are not deployed in the same VPC, you can access the ApsaraDB for HBase cluster only over the Internet. | Requires DataWorks Enterprise Edition or higher; you install, manage, and update DataX yourself |
| Default resource group | Internet access only | Incurs additional data transfer costs |
Use an exclusive resource group or a custom resource group to keep traffic inside the VPC.
The default resource group accesses the cluster over the Internet, which incurs additional costs and introduces network latency. Avoid it for production workloads.
After choosing a resource group, configure its network:
-
Exclusive resource group
-
Associate the resource group with the VPC of your ApsaraDB for HBase cluster. See Exclusive resource group mode.
-
In the VPC console, find the IPv4 CIDR block of the associated VPC and vSwitch, then add those addresses to the cluster whitelist. See Configure a whitelist.
-
-
Custom resource group Add the IP addresses of your ECS instances to the cluster whitelist. See Configure a whitelist.
-
Default resource group Add the CIDR blocks listed for your DataWorks workspace region to the cluster whitelist. To find those CIDR blocks, see the Add the IP addresses or CIDR blocks of the servers in the region where the DataWorks workspace resides to an IP address whitelist of a data source section in the Configure an IP address whitelist topic. Then add them to the cluster whitelist. See Configure a whitelist.
Step 3: Create a synchronization task
-
Create a batch synchronization task. See Configure a batch synchronization task by using the codeless UI.
-
Associate the task with the resource group you configured in step 2.
-
Configure the HBase Reader plug-in and HBase Writer plug-in. See HBase Writer and HBase Reader for the full parameter reference.
Configure hbaseConfig for ApsaraDB for HBase Performance-enhanced Edition
ApsaraDB for HBase Performance-enhanced Edition uses a different connection mechanism than standard HBase. Set hbase.client.endpoint instead of hbase.zookeeper.quorum:
| Parameter | Required | Default | Description |
|---|---|---|---|
hbase.client.connection.impl |
Yes | — | Fixed value: com.alibaba.hbase.client.AliHBaseUEConnection. Do not change this. |
hbase.client.endpoint |
Yes | — | The Java API endpoint of your cluster, in the format host:30020. Get this value from the ApsaraDB for HBase console. See Access a cluster. |
hbase.client.username |
Yes | root |
Username with read and write permissions on the target tables. The default root account has read and write permissions on all tables. |
hbase.client.password |
Yes | root |
Password for the specified user. |
The following example shows a complete hbaseConfig block:
"hbaseConfig": {
"hbase.client.connection.impl": "com.alibaba.hbase.client.AliHBaseUEConnection",
"hbase.client.endpoint": "host:30020",
"hbase.client.username": "testuser",
"hbase.client.password": "password"
}
Use ApsaraDB for HBase 1.1.X when running DataX jobs against ApsaraDB for HBase Performance-enhanced Edition.
The table below shows the key difference between a standard HBase configuration and the ApsaraDB for HBase Performance-enhanced Edition configuration:
| Standard HBase | ApsaraDB for HBase Performance-enhanced Edition | |
|---|---|---|
| Connection parameter | hbase.zookeeper.quorum |
hbase.client.endpoint |
| Connection class | Default HBase client | com.alibaba.hbase.client.AliHBaseUEConnection |
| Example value | ld-bp150tns0sjxs****-proxy-hbaseue.hbaseue.rds.aliyuncs.com:30020 |
host:30020 (Java API endpoint from console) |
Use open source DataX to configure a synchronization task
-
Download the DataX installation package from the official DataX website and decompress it.
-
Configure the
hbase11xreaderplug-in andhbase11xwriterplug-in: For the open source DataX path, use the VPC endpoint as the value ofhbase.zookeeper.quorum:-
hbase11xreaderreads data from ApsaraDB for HBase Performance-enhanced Edition. See hbase11xreader configuration examples. -
hbase11xwriterwrites data to ApsaraDB for HBase Performance-enhanced Edition. See hbase11xwriter configuration examples.
"hbaseConfig": { "hbase.zookeeper.quorum": "ld-bp150tns0sjxs****-proxy-hbaseue.hbaseue.rds.aliyuncs.com:30020" } -
-
Run the synchronization job. For usage details, see the official DataX documentation.