Configure PolarDB-X Input for Batch Data Synchronization - Dataphin

The PolarDB-X input component is designed to read data from a PolarDB-X data source. When synchronizing data from PolarDB-X to other sources, it's necessary to configure the source data source information for the PolarDB-X input component, followed by the target data source for synchronization. This topic describes the configuration process for the PolarDB-X input component.

Prerequisites

A PolarDB-X data source has been created. For more information, see Create PolarDB-X data source.
To configure the PolarDB-X input component properties, the account must possess read-through permission for the data source. If permission is lacking, you must obtain data source permission. For more information, see Request, renew, and return data source permissions.

Procedure

Select Development > Data Integration from the menu bar at the top of the Dataphin home page.
In the menu bar at the top of the integration page, select Project (Dev-Prod mode requires selecting an environment).
In the left-side navigation pane, click on the Batch Pipeline. From the Batch Pipeline list, select the offline pipeline you want to develop to access its configuration page.
To open the Component Library panel, click Component Library located in the upper-right corner of the page.
In the Component Library panel's left-side navigation pane, select Input. Locate the PolarDB-X (formerly DRDS) component within the right-side list of input components, and then drag it onto the canvas.
Click the icon on the PolarDB-X (formerly DRDS) input component card to open the PolarDB-X Input Configuration dialog box.

In the PolarDB-X (formerly DRDS) Input Configuration dialog box, you can set the parameters.

Parameter	Description
Step Name	This is the name of the PolarDB-X input component. Dataphin automatically generates the step name, and you can also modify it according to the business scenario. The naming convention is as follows: Can only contain Chinese characters, letters, underscores (_), and numbers. Cannot exceed 64 characters.
Datasource	The data source drop-down list displays all PolarDB-X type data sources in the current Dataphin, including data sources for which you have read-through permission and those for which you do not. Click the icon to copy the current data source name. For data sources without read-through permission, you can click Request after the data source to request read-through permission. For more information, see Request, renew, and return data source permissions. If you do not have a PolarDB-X type data source, click Create to create a data source. For more information, see Create PolarDB-X data source.
Table	Select the source table for data synchronization. You can enter a table name keyword to search or enter the exact table name and click Exact Search. After selecting a table, the system will automatically perform table status detection. Click the icon to copy the name of the currently selected table.
Batch Read Count (optional)	The number of records read at one time. When reading data from the source database, you can configure a specific batch read count (such as 1024 records) instead of reading one by one to reduce the number of interactions with the data source, improve I/O efficiency, and reduce network latency.
Input Filter (optional)	Configure the filter conditions for extracting data. Detailed configuration instructions are as follows: Configure Static Field: Extract the corresponding data, such as `ds=20211111`. Configure Variable Parameter: Extract a certain part of the data, such as `ds=${bizdate}`.
Output Fields	The output fields area displays all fields hit by the selected table and filter conditions. You can perform the following operations: Field Management: If you do not need to output certain fields to downstream components, you can delete the corresponding fields: Single Field Deletion Scenario: If you need to delete a small number of fields, you can click the icon under the operation column to delete the extra fields. Batch Field Deletion Scenario: If you need to delete many fields, you can click Field Management, select multiple fields in the Field Management dialog box, then click the shift left icon to move the selected input fields to the unselected input fields and click Confirm to complete the batch deletion of fields. Batch Add: Click Batch Add, supporting JSON, TEXT format, DDL format batch configuration. Note After batch addition is completed, clicking confirm will overwrite the configured field information. Batch configuration in JSON format, for example: `// Example: [{ "index": 1, "name": "id", "type": "int(10)", "mapType": "Long", "comment": "comment1" }, { "index": 2, "name": "user_name", "type": "varchar(255)", "mapType": "String", "comment": "comment2" }]` Note Index indicates the column number of the specified object, name indicates the field name after introduction, type indicates the field type after introduction. For example, `"index":3,"name":"user_id","type":"String"` indicates that the fourth column in the file is introduced, the field name is user_id, and the field type is String. Batch configuration in TEXT format, for example: `// Example: 1,id,int(10),Long,comment1 2,user_name,varchar(255),Long,comment2` The row delimiter is used to separate each field's information. The default is a line feed (\n), supporting line feed (\n), semicolon (;), and period (.). The column delimiter is used to separate the field name and field type. The default is a comma (,), supporting`','`. The field type can be omitted, defaulting to`','`. Batch configuration in DDL format, for example: `CREATE TABLE tablename ( user_id serial, username VARCHAR(50), password VARCHAR(50), email VARCHAR (255), created_on TIMESTAMP, );` Create New Output Field: Click +create New Output Field, fill in Column, Type, Comment according to the page prompts, and select Mapping Type. After completing the configuration of the current row, click the icon to save.

Click Confirm to finalize the PolarDB-X input component properties.