Configure the PolarDB-X 2.0 input component - Dataphin - Alibaba Cloud Documentation Center

The PolarDB-X 2.0 input component retrieves data from a PolarDB-X 2.0 data source. When you sync data from a PolarDB-X 2.0 data source to another data source, first configure the PolarDB-X 2.0 input component to read from the source data source. Then configure the destination data source for the sync. This topic explains how to configure the PolarDB-X 2.0 input component.

Prerequisites

You have created a PolarDB-X 2.0 data source. For more information, see Create a PolarDB-X 2.0 Data Source.
The account used to configure the PolarDB-X 2.0 input component must have sync-read permission on the data source. If you do not have this permission, request it. For more information, see Request, Renew, or Release Data Source Permissions.

Procedure

On the Dataphin homepage, in the top menu bar, click Develop, and then click Data Integration.
On the Integration page, in the top menu bar, click Project. In Dev-Prod mode, also select an environment.
In the left navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that you want to develop. The configuration page for the offline pipeline opens.
In the upper-right corner of the page, click Component Library to open the Component Library panel.
In the left navigation pane of the Component Library panel, click Input. In the input component list on the right, locate the PolarDB-X 2.0 component and drag it onto the canvas.
On the PolarDB-X 2.0 input component card, click the icon to open the PolarDB-X 2.0 Input Configuration dialog box.

In the PolarDB-X 2.0 Input Configuration dialog box, configure the following parameters.

Parameter	Description
Step Name	The name of the PolarDB-X 2.0 input component. Dataphin generates a step name automatically. You can change it based on your business scenario. Use the following naming rules: Use only Chinese characters, letters, underscores (_), and digits. Keep the name no longer than 64 characters.
Source Table Count	Select the number of source tables. Options are Single Table and Multiple Tables: Single Table: Use this option when syncing business data from one source table to one destination table. Multiple Tables: Use this option when syncing business data from multiple source tables to one destination table. When writing data from multiple tables into one destination table, use the union algorithm. For more information about union, see INTERSECT, UNION, and EXCEPT.
Datasource	The drop-down list shows all PolarDB-X 2.0 data sources. It includes data sources for which you have sync-read permission and those for which you do not. Click the icon to copy the current data source name. If you do not have sync-read permission for a data source, click Request next to the data source to request read permission. For more information, see Request, Renew, or Release Data Source Permissions. If you do not have a PolarDB-X 2.0 data source, click Create Data Source to create one. For more information, see Create a PolarDB-X 2.0 Data Source.
Database (Optional)	Select the database where the table resides. If you leave this field blank, the system uses the database specified when registering the data source. If you select Multiple Tables for Source Table Count, you can select multiple databases. Click the icon to open the Database List dialog box and view all selected databases.
Table Matching Method	Select Generic Rule or Database Regex. Note This parameter is available only when you select Multiple Tables for Source Table Count.
Table	Select the source table: If you select Single Table for Source Table Count, search by entering a keyword in the table name, or enter the exact table name and click Exact Match. After selecting a table, the system automatically checks the table status. Click the icon to copy the name of the selected table. If you select Multiple Tables for Source Table Count, enter an expression based on the table matching method. If you select Generic Rule for table matching: Enter a table expression in the input box to filter tables with the same structure. The system supports enumeration, regex-like syntax, and mixed formats. For example: `table_[001-100];table_102;`. If you select Database Regex for table matching: Enter a regex supported by the current database. The system matches tables in the destination database using this regex. During task runtime, the system dynamically matches new tables based on the regex. After entering the expression, click Exact Match to open the Confirm Match Details dialog box and view the list of matched tables.
Shard Key (Optional)	The system partitions data based on the configured shard key column. Use this with concurrency settings to enable concurrent reads. You can use any column from the source data table as the shard key. For best performance, use a primary key or indexed column as the shard key. Important If you select a date-time type, the system performs brute-force partitioning across the full time range based on the maximum and minimum values and the concurrency setting. This does not guarantee even distribution.
Input Filter (Optional)	Enter filter conditions for input fields. For example: `ds=${bizdate}`. Use Input Filter in these scenarios: A fixed subset of data. Parameter-based filtering.
Output Fields	The Output Fields section lists all fields from the selected table and filtered results. You can perform the following actions: Field Management: To exclude fields from downstream components, delete them: Delete individual fields: To delete a few fields, click the icon in the Actions column. Batch field deletion scenario: To delete many fields, click Field Management. In the Field Management dialog box, select multiple fields, click the left-moving icon to move the selected input fields to the list of unselected input fields, and click OK to complete batch field deletion. Batch Add: Click Batch Add to add fields in JSON, TEXT, or DDL format. Note After batch adding, clicking OK overwrites existing field configurations. You can perform batch configurations in JSON format. For example: `// Example: [{ "index": 0, "name": "id", "type": "int(10)", "mapType": "Long", "comment": "comment1" }, { "index": 1, "name": "user_name", "type": "varchar(255)", "mapType": "String", "comment": "comment2" }]` Note The index field specifies the column number. The name field specifies the field name after import. The type field specifies the field type after import. For example, `"index":3,"name":"user_id","type":"String"` indicates that the fourth column from the file is imported with the field name user_id and the field type String. You can configure settings in batch using TEXT format, such as: `// Example: 0,id,int(10),Long,comment1 1,user_name,varchar(255),Long,comment2` The row delimiter separates field information. The default is a line feed (\n). You can also use a semicolon (;) or period (.). The column delimiter separates field names and types. The default is a comma (,). You can use`','`. Field types are optional. The default is`','`. You can configure in batches in DDL format, for example: `CREATE TABLE tablename ( user_id serial, username VARCHAR(50), password VARCHAR(50), email VARCHAR (255), created_on TIMESTAMP, );` Add a new output field: Click + Add Output Field. Enter values for Column, Type, and Comment. Select a Mapping Type. Click the icon to save the row.

Click OK to finish configuring the PolarDB-X 2.0 input component.