How to configure the PolarDB input component to read data from a data source - Dataphin

PolarDB input components are used to read data from PolarDB data sources. When you need to synchronize data from PolarDB data sources to other data sources, you must first configure the source PolarDB data source, and then configure the destination data source. This topic describes how to configure PolarDB input components.

Prerequisites

A PolarDB data source is created. For more information, see Create a PolarDB data source.
The account used to configure the properties of the PolarDB input component must have the read-through permission on the data source. If you do not have the permission, you must request it on the data source. For more information, see Request, renew, and return permissions on data sources.

Procedure

In the top navigation bar of the Dataphin homepage, choose Develop > Data Integration.
In the top navigation bar of the Integration page, select a project (In Dev-Prod mode, you need to select an environment).
In the navigation pane on the left, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that you want to develop to open its configuration page.
Click Component Library in the upper-right corner of the page to open the Component Library panel.
In the navigation pane on the left of the Component Library panel, select Inputs. Find the PolarDB component in the list of input components on the right, and drag it to the canvas.
Click the icon in the PolarDB input component card to open the PolarDB Input Configuration dialog box.

In the PolarDB Input Configuration dialog box, configure the parameters.

Parameter	Description
Step Name	The name of the PolarDB input component. Dataphin automatically generates a step name. You can also modify the name based on your business scenario. The name must meet the following requirements: It can contain only Chinese characters, letters, underscores (_), and digits. It cannot exceed 64 characters in length.
Datasource	The data source list displays all PolarDB data sources, including those for which you have the read-through permission and those for which you do not have the read-through permission. Click the icon to copy the current data source name. For data sources for which you do not have the read-through permission, you can click Request after the data source to request the read-through permission on the data source. For more information, see Request, renew, and return permissions on data sources. If you do not have a PolarDB data source, click Create to create a data source. For more information, see Create a PolarDB data source.
Time Zone	Time-formatted data is processed based on the time zone. By default, this is the time zone configured in the selected data source. This setting cannot be changed. Note For nodes created before V5.1.2, you can select Data Source Default Configurations or Channel Configuration Time Zone. The default option is Channel Configuration Time Zone. Data Source Default Configurations: The default time zone of the selected data source. Channel Configuration Time Zone: The time zone for the current integration node, which is configured in Properties > Channel Configuration.
Source Table Quantity	Select the number of source tables for data synchronization. The source table quantity includes Single Table and Multiple Tables: Single Table: This option is applicable to scenarios where business data from one table is synchronized to one destination table. Multiple Tables: This option is applicable to scenarios where business data from multiple tables is synchronized to the same destination table. When data from multiple tables is written to the same data table, the union algorithm is used.
Table	Select a source table: If you select Single Table for Source Table Quantity, you can enter a keyword to search for a table or enter an exact table name and click Exact Match. After you select a table, the system automatically checks the table status. Click the icon to copy the name of the selected table. If you select Multiple Tables for Source Table Quantity, perform the following steps to add tables. In the input field, enter a table expression to filter tables with the same structure. The system supports enumeration, regular expressions, and a combination of both. For example, `table_[001-100];table_102`. Click Exact Match. In the Confirm Matching Details dialog box, view the list of matched tables. Click OK.
Shard Key (optional)	The system shards data based on the configured shard key field. You can use this parameter with the concurrent reading configuration to implement concurrent reading. You can use a column in the source data table as the shard key. We recommend that you use the primary key or a column with an index as the shard key to ensure transmission performance. Important When you select a date and time type, the system identifies the maximum and minimum values, and performs forced sharding based on the total time range and concurrency. Even distribution is not guaranteed.
Batch Read Count (optional)	The number of data records to read at a time. When reading data from the source database, you can configure a specific batch read count (such as 1,024 records) instead of reading records one by one. This reduces the number of interactions with the data source, improves I/O efficiency, and reduces network latency.
Input Filter (optional)	Enter filter information for input fields, for example, `ds=${bizdate}`. Input filters are for the following two scenarios: Fixed part of data. Parameter filtering.
Output Fields	The Output Fields section displays all fields in the selected table and fields that match the filter conditions. You can perform the following operations: Field Management: If you do not need to output specific fields to downstream components, you can delete these fields: Delete a single field: To delete a small number of fields, you can click the icon in the Operation column to delete unnecessary fields. Delete multiple fields in batches: To delete many fields, you can click Field Management. In the Field Management dialog box, select multiple fields, click the left arrow icon to move the selected input fields to the unselected input fields, and then click OK to complete the batch deletion of fields. Batch Add: Click Batch Add to configure fields in JSON, TEXT, or DDL format. Note After you complete batch addition and click OK, the system overwrites the configured field information. Configure fields in JSON format, for example: `// Example: [{ "index": 1, "name": "id", "type": "int(10)", "mapType": "Long", "comment": "comment1" }, { "index": 2, "name": "user_name", "type": "varchar(255)", "mapType": "String", "comment": "comment2" }]` Note The `index` parameter specifies the column number of the object. The `name` parameter specifies the field name after import. The `type` parameter specifies the field type after import. For example, `"index":3,"name":"user_id","type":"String"` indicates that the fourth column in the file is imported, the field name is user_id, and the field type is String. Configure fields in TEXT format, for example: `// Example: 1,id,int(10),Long,comment1 2,user_name,varchar(255),Long,comment2` The row delimiter is used to separate the information of each field. The default value is the line feed (\n). The system supports line feed (\n), semicolon (;), and period (.). The column delimiter separates the field name from the field type. The default is a comma (,). You can use `','`. The field type is optional, and the default is `','`. Configure fields in DDL format, for example: `CREATE TABLE tablename ( user_id serial, username VARCHAR(50), password VARCHAR(50), email VARCHAR (255), created_on TIMESTAMP, );` Create an output field: Click + Create Output Field. Follow the page prompts to enter Column, Type, and Description, and select Mapping Type. After you complete the configuration of the current row, click the icon to save the configuration.

Click OK to complete the property configuration of the PolarDB input component.