The PolarDB-X 2.0 input component reads data from a PolarDB-X 2.0 data source. To synchronize data from a PolarDB-X 2.0 data source to another data source, you must first configure the PolarDB-X 2.0 input component and then configure the component for the destination data source. This topic describes how to configure the PolarDB-X 2.0 input component.
Prerequisites
A PolarDB-X 2.0 data source has been created. For more information, see Create a PolarDB-X 2.0 data source.
The account that you use to configure the PolarDB-X 2.0 input component has the sync read permission for the data source. If you do not have this permission, request it for the data source. For more information, see Request, renew, and return data source permissions.
Procedure
On the Dataphin home page, in the top menu bar, choose Development > Data Integration.
In the top menu bar of the integration page, select a Project. In Dev-Prod mode, you must also select an environment.
In the navigation pane on the left, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that you want to develop to open its configuration page.
In the upper-right corner of the page, click Component Library to open the Component Library panel.
In the Component Library panel, in the navigation pane on the left, select Input. In the list of input components on the right, find the PolarDB-X 2.0 component and drag it to the canvas.
On the PolarDB-X 2.0 input component card, click the
icon to open the PolarDB-X 2.0 Input Configuration dialog box.In the PolarDB-X 2.0 Input Configuration dialog box, configure the following parameters.
Parameter
Description
Step Name
The name of the PolarDB-X 2.0 input component. Dataphin automatically generates a step name. You can also change it as needed. The naming convention is as follows:
Can contain only Chinese characters, letters, underscores (_), and digits.
Cannot exceed 64 characters in length.
Source Table Quantity
Select the number of source tables. The options are Single Table and Multiple Tables:
Single Table: Use this option to synchronize data from one table to one destination table.
Multiple Tables: Use this option to synchronize data from multiple tables to the same destination table. When data from multiple tables is written to the same data table, the union algorithm is used.
For more information about union, see INTERSECT, UNION, and EXCEPT.
Datasource
The drop-down list displays all PolarDB-X 2.0 data sources. This includes data sources for which you have the sync read permission and those for which you do not. Click the
icon to copy the current data source name.For a data source where you lack the sync read permission, click Request next to the data source to request read permission. For more information about how to request read permission for a data source, see Request, renew, and return data source permissions.
If you do not have a PolarDB-X 2.0 data source, click Create Data Source to create one. For more information, see Create a PolarDB-X 2.0 data source.
Database (Optional)
Select the database where the table is located. If you leave this blank, the database specified during data source registration is used.
If you set Source Table Quantity to Multiple Tables, you can select multiple databases. Click the
icon to view all selected databases in the Database List dialog box.Table
Select the source table:
If you set Source Table Quantity to Single Table, you can enter a keyword to search for the table name, or enter the exact table name and click Exact Match. After you select a table, the system automatically checks its status. Click the
icon to copy the name of the currently selected table.If you set Source Table Quantity to Multiple Tables, perform the following steps to add tables.
In the input box, enter an expression to filter for tables with the same structure.
The system supports enumeration, regular expression-like patterns, and a mix of both. For example,
table_[001-100];table_102.Click Exact Match to view a list of matching tables in the Confirm Match Details dialog box.
Click Confirm.
Shard Key (Optional)
The system shards data based on the configured shard key field. You can use this with the concurrency setting to enable concurrent reads. You can use a column from the source table as the shard key. For better transfer performance, use a primary key or an indexed column as the shard key.
ImportantIf you select a date and time type, the system identifies the maximum and minimum values. It then performs a rough split based on the total time range and concurrency. The splits are not guaranteed to be even.
Input Filter (Optional)
Enter the filter information for the input fields. For example,
ds=${bizdate}. The Input Filter is suitable for the following two scenarios:Filtering a fixed portion of data.
Filtering by parameters.
Output Fields
The Output Fields section displays all fields from the selected tables that match the filter criteria. The following operations are supported:
Manage Fields: If you do not need to output certain fields to downstream components, you can delete them:
To delete a single field: To delete a few fields, click the
icon in the Actions column to remove unwanted fields.To delete fields in batch: To delete many fields, click Manage Fields. In the Manage Fields dialog box, select multiple fields. Click the
left arrow icon to move the selected input fields to the unselected input fields list. Then, click OK to complete the batch deletion.
Batch Add: Click Batch Add to configure fields in batch using JSON, TEXT, or DDL format.
NoteAfter you add fields in batch and click OK, the existing field configuration will be overwritten.
To configure in batch using JSON format, for example:
// Example: [{ "index": 0, "name": "id", "type": "int(10)", "mapType": "Long", "comment": "comment1" }, { "index": 1, "name": "user_name", "type": "varchar(255)", "mapType": "String", "comment": "comment2" }]Note`index` specifies the column number of the object. `name` specifies the field name after import. `type` specifies the field type after import.
For example,
"index":3,"name":"user_id","type":"String"means that the fourth column of the file is imported with the field name `user_id` and the field type `String`.To configure in batch using TEXT format, for example:
// Example: 0,id,int(10),Long,comment1 1,user_name,varchar(255),Long,comment2The row delimiter separates the information for each field. The default is a line feed (\n). Semicolons (;) and periods (.) are also supported.
The column delimiter is used to separate the field name from the field type. The default is a comma (,), and the supported character is
','. The field type can be omitted, and the default delimiter is','.
To configure in batch using DDL format, for example:
CREATE TABLE tablename ( user_id serial, username VARCHAR(50), password VARCHAR(50), email VARCHAR (255), created_on TIMESTAMP, );
Create Output Field: Click + Create Output Field. Follow the prompts to enter the Column, Type, and Remarks, and select a Mapping Type. After you finish configuring the current row, click the
icon to save.
Click Confirm to complete the configuration of the PolarDB-X 2.0 input component.