Set Up PostgreSQL Input for Offline Batch Data Pipelines - Dataphin

The PostgreSQL input component reads data from a PostgreSQL data source. To sync data from a PostgreSQL source to another destination, you must first configure the PostgreSQL input component to read from the source. Then, you must configure the destination component. This topic describes how to configure the PostgreSQL input component.

Prerequisites

A PostgreSQL data source has been added. For more information, see Create a PostgreSQL data source.
To configure the properties of the PostgreSQL input component, you must use an account that has read-through permission for the data source. If you do not have this permission, you must apply for data source permissions. For more information, see Apply for, renew, or release data source permissions.

Procedure

You can select Developer > Data Integration from the top menu bar on the Dataphin homepage.
On the Integration page, in the top menu bar, select a Project. In Dev-Prod mode, you must also select an environment.
In the navigation pane on the left, click Offline Integration. In the Offline Integration list, click the offline pipeline that you want to develop to open its configuration page.
Click Component Library in the upper-right corner of the page. This opens the Component Library panel.
You can select Input in the left navigation pane of the Component Library panel. Then, locate the PostgreSQL component in the input component list on the right and drag it to the canvas.
Click the icon on the PostgreSQL input component card. This opens the PostgreSQL Input Configuration dialog box.

In the PostgreSQL Input Configuration dialog box, configure the following parameters.

Parameter	Description
Step Name	The name of the PostgreSQL input component. Dataphin automatically generates the step name. You can modify it as needed. Naming rules are as follows: Can contain only Chinese characters, letters, underscores (_), and numbers. Length cannot exceed 64 characters.
Datasource	The data source drop-down list displays all PostgreSQL data sources. This includes data sources for which you have read-through permissions and those for which you do not. Click the icon to copy the current data source name. For data sources for which you do not have read-through permission, you can click Request after the data source to request read permission for the corresponding data source. For specific steps to request data source read permission, see Request, renew, and return data source permissions. If you don't have a PostgreSQL data source yet, click New Data Source to create one. For detailed instructions, see Creating a PostgreSQL Data Source.
Time Zone	Time-formatted data is processed based on the current time zone. By default, this is the time zone configured in the selected data source and cannot be modified. Note For tasks created before V5.1.2, select Data Source Default Configuration or Channel Configuration Time Zone. The default selection is Channel Configuration Time Zone. Data Source Default Configuration: The default time zone of the selected data source. Channel Configuration Time Zone: The time zone configured for the current integration task in Properties > Channel Configuration.
Schema (Optional)	Supports cross-schema table selection. Select the schema where the table resides. If not specified, the default is the schema configured in the data source.
Source Table Quantity	Select the source table quantity. Source table quantity includes Single Table and Multiple Tables: Single Table: Use this for scenarios where you sync business data from one table to one destination table. Multiple Tables: Use this for scenarios where you sync business data from multiple tables to the same destination table. When data from multiple tables is written to the same data table, the union algorithm is used.
Table Matching Method	You can select General Class Rule or Database Regex. Note Configure this option only when Source Table Quantity is set to Multiple Tables.
Table	Select the source table: If Source Table Quantity is set to Single Table, enter a table name keyword to search. Alternatively, enter the exact table name and click Precise Search. After selecting a table, the system automatically detects its status. Click the icon to copy the name of the selected table. If Source Table Quantity is set to Multiple Tables, add tables by entering different expressions based on the table matching method. If General Rules is selected for the table matching method: In the input box, enter a table expression to filter for tables with the same structure. The system supports enumeration, regex-like, and mixed forms. For example, `table_[001-100];table_102;`. If Database Regex is selected for the table matching method: In the input box, enter a regular expression supported by the current database. The system matches tables in the destination database based on this regular expression. During runtime, the task immediately matches new table ranges for synchronization based on the database regular expression. After entering the expression, click Precise Search. This displays a list of matched tables in the Confirm Match Details dialog box.
Shard Key (Optional)	The system performs data partitioning based on the configured shard key field. Use this with concurrency configuration to achieve concurrent reads. You can use a column in the source data table as the shard key. Additionally, use a primary key or an indexed column as the shard key to ensure transfer performance. Important When you select a datetime type, the system identifies the maximum and minimum values. It then performs a brute-force split based on the total time range and concurrency. This does not guarantee an even distribution.
Batch Read Count (Optional)	The number of data records read at one time. When reading data from the source database, configure a specific batch read count (such as 1024 records) instead of reading one record at a time. This reduces interactions with the data source, improves I/O efficiency, and lowers network latency.
Input Filter (Optional)	Enter filter information for the input field, for example, `ds=${bizdate}`. Input Filtering applies to the following two scenarios: A fixed subset of data. Parameter filtering.
Output Fields	The Output Fields area displays all fields in the selected table and those matching the filter conditions. It supports the following operations: Field Management: If you do not need to output certain fields to downstream components, delete the corresponding fields: Single Field Deletion Scenario: To delete a few fields, click the icon in the Operation column to remove unwanted fields. Batch field deletion scenario: To delete many fields, click Field Management. In the Field Management dialog box, select multiple fields, then click the left-moving icon to move the selected input fields to the unselected input fields, and click OK to complete batch field deletion. Batch Add: Click Batch Add. This supports batch configuration in JSON, TEXT, and DDL formats. Note After batch addition, clicking OK will overwrite the configured field information. Batch configure in JSON format. For example: `// Example: [{ "index": 0, "name": "id", "type": "int(10)", "mapType": "Long", "comment": "comment1" }, { "index": 1, "name": "user_name", "type": "varchar(255)", "mapType": "String", "comment": "comment2" }]` Note Index indicates the column number of the specified object. Name indicates the field name after import. Type indicates the field type after import. For example, `"index":3,"name":"user_id","type":"String"` means to import the fourth column from the file, with the field name user_id and field type String. Batch configure in TEXT format. For example: `// Example: 0,id,int(10),Long,comment1 1,user_name,varchar(255),Long,comment2` The row delimiter separates information for each field. The default is a line feed (\n). It supports line feed (\n), semicolon (;), and period (.). The column delimiter separates the field name and field type. The default is a comma (,). It supports `','`. The field type is optional, and the default is `','`. Batch configure in DDL format. For example: `CREATE TABLE tablename ( user_id serial, username VARCHAR(50), password VARCHAR(50), email VARCHAR (255), created_on TIMESTAMP, );` Create Output Field: Click + Create Output Field, and as prompted, specify Column, Type, Description, and select a Mapping Type. After you configure this row, click the icon to save.

Click OK to complete the property configuration for the PostgreSQL input component.