Sync HBase Data in Batch Pipelines via Input Component - Dataphin

The HBase input component reads data from an HBase data source. When you need to synchronize data from an HBase data source to other data sources, you must first configure the HBase input component to read the data source, and then configure the target data source for data synchronization. This topic describes how to configure an HBase input component.

Prerequisites

You have purchased and enabled the high availability (HA) feature of the DataService Studio or Tag Service module to configure primary/secondary links for data sources.
You have created an HBase data source. For more information, see Create an HBase data source.
The account used to configure the HBase input component properties must have read-through permission on the data source. If you do not have the permission, you need to request it. For more information, see Request, renew, and return permissions on a data source.

Procedure

In the top navigation bar of the Dataphin homepage, choose Develop > Data Integration.
In the top navigation bar of the integration page, select a project (In Dev-Prod mode, you need to select an environment).
In the left-side navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that you want to develop to open its configuration page.
Click Component Library in the upper-right corner of the page to open the Component Library panel.
In the left-side navigation pane of the Component Library panel, select Inputs. Find the HBase component in the input component list on the right and drag it to the canvas.
Click the icon in the HBase input component card to open the HBase Input Configuration dialog box.

In the HBase Input Configuration dialog box, configure the parameters.

Parameter	Description
Step Name	The name of the HBase input component. Dataphin automatically generates a step name, which you can modify based on your business scenario. The name must meet the following requirements: It can contain only Chinese characters, letters, underscores (_), and digits. It cannot exceed 64 characters in length.
Datasource	The dropdown list displays all HBase data sources in the current Dataphin instance, including those for which you may not have read-through permission. Click the icon to copy the current data source name. For data sources for which you do not have read-through permission, you can click Request next to the data source to request read-through permission. For more information, see Request permission on a data source. If you do not have an HBase data source, click Create to create one. For more information, see Create an HBase data source.
Select Link	If you have enabled the high availability feature of Tag Service and the selected HBase data source has Active/standby Links, you can select either the Active Link or Standby Link for integration. This only affects the production data source.
Table	You can enter a keyword to search for tables or enter the exact table name and click Exact Match. Click the icon to copy the name of the selected table.
Output Mode	Select an output mode. The options are Normal Mode and Multi-version Mode (Vertical Table).
maxversion	If you select Multi-version Mode (Vertical Table) as the output mode, you need to specify maxversion. maxversion specifies the number of versions to read. A value of -1 indicates that all versions are read.
File Encoding	Select a file encoding format. The system supports File Encoding formats including UTF-8 and GBK.
Start Rowkey	Specifies a starting rowkey as the starting point for scanning. All rows with rowkeys that are lexicographically greater than or equal to this starting rowkey will be included in the scan results. For example, `aaa` (string) or `10110` (binary).
End Rowkey	Defines the end position of the scan operation. If an end rowkey is specified, all rows with rowkeys that are lexicographically less than this rowkey will be scanned, but the end rowkey itself is not included (i.e., the scan is a left-closed, right-open interval). For example, to scan all user records from `user0001` to `user9999` in an HBase table. You can set the start rowkey to `user0001` and the end rowkey to `user10000`. This will return all rows that start with `user` and have rowkey values between `user0001` and `user10000`, but will not include the row with the rowkey `user10000`.
Start Rowkey Type	Select the type of the start rowkey. The options are String or Binary.
Output Fields	Displays the output fields. Batch Add Fields. Click Batch Add. Configure in JSON format. For example: `// Example: [{ "name": "cf1:q1", "type": "string" }, { "name": "cf1:q2", "type": "string" }, { "name": "cf1:q3", "type": "string" }]` Note name represents the imported column family and field name, and type represents the field type. For example, `"name":"cf1:a","type":"String"` indicates that the field `a` in the column family `cf1` is imported, and the field type is `String`. Configure in TEXT format. For example: `// Example: cf1:q1,string cf1:q2,string cf1:q3,string` The row delimiter is used to separate the information of each field. The default is a line feed (\n). Supported delimiters include line feed (\n), semicolon (;), and period (.). The column delimiter is used to separate the field name and field type. The default is a comma (,). Click OK. Create A New Output Field. Click Create Output Field, and fill in the Column Family, Column, and select the Type as prompted. Manage output fields. You can perform the following operations on added fields: Click and drag the Column icon next to to change the position of the field. Click the Operation icon in the column to edit an existing field. Click the Operation icon in the column to delete an existing field.

Click OK to complete the property configuration of the HBase input component.