Import data to Lindorm - Lindorm - Alibaba Cloud Documentation Center

DataWorks is an end-to-end big data development and administration platform from Alibaba Cloud. It provides features such as data integration, data development, and data administration. You can use DataWorks to configure import tasks to perform a full import of data to LindormTable from sources such as MySQL, PolarDB, PostgreSQL, Oracle, SQL Server, and Cassandra. This topic describes how to configure a Lindorm import task in DataWorks.

Prerequisites

You have added the client IP address to the Lindorm whitelist.

Notes

If you use public network access or a single-node Lindorm instance, you must upgrade the software development kit (SDK) and change the configuration before you perform the steps in this topic. For more information, see Step 1 in Connect to and use LindormTable through the HBase Java API.
If your application is deployed on an ECS instance, ensure that the Lindorm and ECS instances meet the following conditions to access the Lindorm instance over a virtual private cloud (VPC). This ensures network connectivity.
- The instances are in the same region. For lower network latency, place them in the same zone.
- The ECS instance and the Lindorm instance are in the same VPC.

Step 1: Create a workspace

Create a workspace in DataWorks to manage data development and tasks.

Step 2: Create a resource group

A resource group helps you allocate resources within your account and manage user authorizations.

The following table describes the types of resource groups you can create.

Resource group type	Configuration document	Notes
Exclusive resource group	Exclusive resource group mode	Exclusive resources cannot be used across regions. For example, an exclusive resource in the China (Shanghai) region can only be used by workspaces in the China (Shanghai) region. It cannot be attached to a VPC in another region. In addition, an exclusive resource group cannot access a Lindorm cluster across vSwitches.
Default resource group	None	Accessing Lindorm over the public network incurs additional fees in DataWorks.

Step 3: Configure the network

Configure the network based on the resource group type to ensure connectivity between DataWorks and the Lindorm instance.

Exclusive resource group

On the Instance Details page of the Lindorm instance, obtain the virtual private cloud (VPC) of the instance.
Attach the DataWorks exclusive resource group to the VPC of the Lindorm instance.
In the VPC console, obtain the IPv4 CIDR block of the VPC and vSwitch that are attached to the DataWorks exclusive resource group.
Add the obtained IPv4 CIDR block to the Lindorm whitelist.

Default resource group

To obtain the IP addresses of the machines in the default resource group, see Add a whitelist. Then, add the IP addresses that correspond to the region to the Lindorm whitelist.

Step 4: Create a sync task

Create a data import offline sync task.

Step 5: Modify the task configuration

If you access Lindorm using Lindorm SQL, refer to the TableService model configuration in the Lindorm data source and Lindorm data source documents.
If you access Lindorm using the HBase-compatible mode, refer to the WideColumn model configuration in the Lindorm data source and Lindorm data source documents.

Important

The lindorm.client.seedserver parameter in the sample script specifies the HBase Java API-compatible endpoint for LindormTable.

Step 6: Submit and publish the task

If the task needs to run on a schedule, publish it to the production environment. For more information about publishing tasks, see Publish a task.