All Products
Search
Document Center

ApsaraDB for HBase:Import data using DataWorks or DataX

Last Updated:Mar 30, 2026

ApsaraDB for HBase Performance-enhanced Edition provides Lindorm Tunnel Service (LTS) for data migration and real-time data synchronization between ApsaraDB for HBase clusters of various versions. LTS also allows you to synchronize real-time data from ApsaraDB RDS or LogHub to ApsaraDB for HBase. DataX is an offline data synchronization tool for moving data between heterogeneous sources — including MySQL, Oracle, SQL Server, PostgreSQL, Hadoop Distributed File System (HDFS), Hive, AnalyticDB for MySQL, HBase, Tablestore, MaxCompute, and Distributed Relational Database Service (DRDS). Use DataX when you need to bulk-import data from these sources into ApsaraDB for HBase Performance-enhanced Edition.

To run DataX against ApsaraDB for HBase Performance-enhanced Edition, choose one of the following approaches:

  • DataWorks Data Integration (recommended) — a managed service that provisions and operates the DataX runtime for you.

  • Open source DataX — download and run DataX directly on a self-managed machine.

Prerequisites

Before you begin, ensure that you have:

Usage notes

  • ApsaraDB for HBase clusters are accessible only over a virtual private cloud (VPC). If you need Internet access, update the SDK before importing data. See Use ApsaraDB for HBase SDK for Java to replace an open source HBase version with an ApsaraDB for HBase version.

  • If your application runs on an Elastic Compute Service (ECS) instance and accesses the cluster over a VPC, the ECS instance and the cluster must meet the following conditions:

    • Both are deployed in the same region. Deploying them in the same zone reduces network latency.

    • Both belong to the same VPC.

Use DataWorks to configure a synchronization task

Step 1: Create a workspace

Create a workspace in DataWorks. See Create a workspace.

Step 2: Choose and configure a resource group

DataWorks runs DataX jobs inside a resource group. The table below describes the three types and their trade-offs.

Resource group type How it accesses the cluster Key constraints
Exclusive resource group (recommended) Associates directly with the cluster's VPC and vSwitch — no Internet traffic Resources are region-scoped; cannot be shared across regions. You can use resources from an exclusive resource group to access ApsaraDB for HBase clusters that are attached to the same vSwitch.
Custom resource group Deploys ECS instances inside the cluster's VPC (same-VPC access). If the ECS instances and the ApsaraDB for HBase cluster are not deployed in the same VPC, you can access the ApsaraDB for HBase cluster only over the Internet. Requires DataWorks Enterprise Edition or higher; you install, manage, and update DataX yourself
Default resource group Internet access only Incurs additional data transfer costs

Use an exclusive resource group or a custom resource group to keep traffic inside the VPC.

Important

The default resource group accesses the cluster over the Internet, which incurs additional costs and introduces network latency. Avoid it for production workloads.

After choosing a resource group, configure its network:

Step 3: Create a synchronization task

  1. Create a batch synchronization task. See Configure a batch synchronization task by using the codeless UI.

  2. Associate the task with the resource group you configured in step 2.

  3. Configure the HBase Reader plug-in and HBase Writer plug-in. See HBase Writer and HBase Reader for the full parameter reference.

Configure hbaseConfig for ApsaraDB for HBase Performance-enhanced Edition

ApsaraDB for HBase Performance-enhanced Edition uses a different connection mechanism than standard HBase. Set hbase.client.endpoint instead of hbase.zookeeper.quorum:

Parameter Required Default Description
hbase.client.connection.impl Yes Fixed value: com.alibaba.hbase.client.AliHBaseUEConnection. Do not change this.
hbase.client.endpoint Yes The Java API endpoint of your cluster, in the format host:30020. Get this value from the ApsaraDB for HBase console. See Access a cluster.
hbase.client.username Yes root Username with read and write permissions on the target tables. The default root account has read and write permissions on all tables.
hbase.client.password Yes root Password for the specified user.

The following example shows a complete hbaseConfig block:

"hbaseConfig": {
  "hbase.client.connection.impl": "com.alibaba.hbase.client.AliHBaseUEConnection",
  "hbase.client.endpoint": "host:30020",
  "hbase.client.username": "testuser",
  "hbase.client.password": "password"
}
Use ApsaraDB for HBase 1.1.X when running DataX jobs against ApsaraDB for HBase Performance-enhanced Edition.

The table below shows the key difference between a standard HBase configuration and the ApsaraDB for HBase Performance-enhanced Edition configuration:

Standard HBase ApsaraDB for HBase Performance-enhanced Edition
Connection parameter hbase.zookeeper.quorum hbase.client.endpoint
Connection class Default HBase client com.alibaba.hbase.client.AliHBaseUEConnection
Example value ld-bp150tns0sjxs****-proxy-hbaseue.hbaseue.rds.aliyuncs.com:30020 host:30020 (Java API endpoint from console)

Use open source DataX to configure a synchronization task

  1. Download the DataX installation package from the official DataX website and decompress it.

  2. Configure the hbase11xreader plug-in and hbase11xwriter plug-in: For the open source DataX path, use the VPC endpoint as the value of hbase.zookeeper.quorum:

    "hbaseConfig": {
      "hbase.zookeeper.quorum": "ld-bp150tns0sjxs****-proxy-hbaseue.hbaseue.rds.aliyuncs.com:30020"
    }
  3. Run the synchronization job. For usage details, see the official DataX documentation.

What's next