Configure PostgreSQL output component - Dataphin - Alibaba Cloud Documentation Center

The PostgreSQL output component writes data to a PostgreSQL data source. In scenarios where data from other data sources is synchronized to a PostgreSQL data source, after configuring the source data information, you need to configure the target data source for the PostgreSQL output component. This topic describes how to configure a PostgreSQL output component.

Prerequisites

You have added a PostgreSQL data source. For more information, see Create a PostgreSQL data source.
The account used to configure the PostgreSQL output component properties has the write-through permission for the data source. If you do not have the permission, you need to request the data source permission. For more information, see Request, renew, and return data source permissions.

Procedure

In the top navigation bar of the Dataphin homepage, choose Develop > Data Integration.
In the top navigation bar of the integration page, select Project (In Dev-Prod mode, you need to select an environment).
In the navigation pane on the left, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that you want to develop to open its configuration page.
Click Component Library in the upper-right corner of the page to open the Component Library panel.
In the navigation pane on the left of the Component Library panel, select Outputs. Find the PostgreSQL component in the output component list on the right and drag it to the canvas.
Click and drag the icon of the target input component to connect it to the current PostgreSQL output component.
Click the icon in the PostgreSQL output component card to open the PostgreSQL Output Configuration dialog box.

In the PostgreSQL Output Configuration dialog box, configure the parameters.

Parameter		Description
Basic Settings	Step Name	The name of the PostgreSQL output component. Dataphin automatically generates a step name, which you can modify based on your business scenario. The name must meet the following requirements: It can contain only Chinese characters, letters, underscores (_), and digits. It can be up to 64 characters in length.
	Datasource	The data source dropdown list displays all PostgreSQL data sources, including those for which you have and do not have the write-through permission. Click the icon to copy the current data source name. For data sources for which you do not have the write-through permission, you can click Request after the data source to request the write-through permission. For more information, see Request data source permissions. If you do not have a PostgreSQL data source, click the icon to create one. For more information, see Create a PostgreSQL data source.
	Time Zone	The time zone used to process time format data. The default is the time zone configured in the selected data source and cannot be modified. Note For tasks created before V5.1.2, you can select Default Data Source Configuration or Channel Configuration Time Zone. The default is Channel Configuration Time Zone. Default Data Source Configuration: the default time zone of the selected data source. Channel Configuration Time Zone: the time zone configured in Properties > Channel Configuration for the current integration task.
	Schema (optional)	Supports selecting tables across schemas. Select the schema where the table is located. If not specified, the default is the schema configured in the data source.
	Table	Select the target table for output data. You can search by entering table name keywords or enter the exact table name and click Exact Match. After selecting a table, the system automatically checks the table status. Click the icon to copy the name of the currently selected table. If the target table for data synchronization does not exist in the MySQL data source, you can use the one-click table creation feature to quickly generate the target table. Perform the following steps: Click One-Click Table Creation. Dataphin automatically matches the code to create the target table, including the target table name (default is the source table name), field types (initially converted based on Dataphin fields), and other information. You can modify the SQL script for creating the target table as needed, and then click Create. After the target table is created, Dataphin automatically sets it as the target table for output data. One-click table creation is used to create target tables for data synchronization in development and production environments. Dataphin selects the production environment for table creation by default. If a table with the same name and structure already exists in the production environment, you do not need to select table creation for the production environment. Note If a table with the same name exists in the development or production environment, Dataphin will report an error when you click Create. When there are no matching items, you can also perform integration based on a manually entered table name.
	Loading Policy	Select the strategy for writing data to the target table. Loading Policy includes: Append Data (insert Into): When a primary key/constraint conflict occurs, a dirty data error will be reported. Update On Primary Key Conflict (on Conflict Do Update Set): When a primary key/constraint conflict occurs, the data in the mapped fields will be updated on the existing record.
	Synchronous Write	The primary key update syntax is not an atomic operation. If the written data has duplicate primary keys, you need to enable synchronous write. Otherwise, parallel write is used. Synchronous write has lower performance than parallel write. Note This option is only available when the loading policy is set to Update on Primary Key Conflict.
	Batch Write Data Size (optional)	The size of data to be written at once. You can also set Batch Write Records. The system will write data when either limit is reached. The default is 32M.
	Batch Write Records (optional)	The default is 2048 records. When synchronizing data, a batch write strategy is used, with parameters including Batch Write Records and Batch Write Data Size. When the accumulated data reaches either of the set limits (batch write data size or record count), the system considers a batch of data to be full and immediately writes this batch of data to the target at once. It is recommended to set the batch write data size to 32MB. For the batch insert record limit, you can adjust it flexibly based on the actual size of a single record, usually setting it to a larger value to fully utilize the advantages of batch writing. For example, if a single record is about 1KB, you can set the batch insert byte size to 16MB, and considering this condition, set the batch insert record count to greater than the result of 16MB divided by the single record size of 1KB (i.e., greater than 16384 records), for example, 20000 records. With this configuration, the system will trigger batch writes based on the batch insert byte size, executing a write operation whenever the accumulated data reaches 16MB.
	Prepare Statement (optional)	The SQL script to be executed on the database before data import. For example, to ensure continuous service availability, before the current step writes data, it first creates a target table Target_A, then executes writing to Target_A. After the current step completes writing data, it renames the continuously serving table Service_B to Temp_C, then renames Target_A to Service_B, and finally deletes Temp_C.
	Post Statement (optional)	The SQL script to be executed on the database after data import.
Field Mapping	Input Fields	Displays the input fields based on the upstream output.
	Output Fields	Displays the output fields. You can perform the following operations: Field Management: Click Field Management to select output fields. Click the icon to move Selected Input Fields to Unselected Input Fields. Click the icon to move Unselected Input Fields to Selected Input Fields. Batch Add: Click Batch Add to configure in JSON, TEXT, or DDL format. Batch configuration in JSON format, for example: `// Example: [{ "name": "user_id", "type": "String" }, { "name": "user_name", "type": "String" }]` Note name represents the imported field name, and type represents the field type after import. For example, `"name":"user_id","type":"String"` means importing a field named user_id and setting its type to String. Batch configuration in TEXT format, for example: `// Example: user_id,String user_name,String` The row delimiter is used to separate information for each field. The default is a line feed (\n), and it can be a line feed (\n), semicolon (;), or period (.). The column delimiter is used to separate the field name and field type. The default is a comma (,). Batch configuration in DDL format, for example: `CREATE TABLE tablename ( id INT PRIMARY KEY, name VARCHAR(50), age INT );` Create Output Field: Click +Create Output Field, fill in the Column and select the Type as prompted. After completing the configuration for the current row, click the icon to save.
	Mapping	Based on the upstream input and target table fields, you can manually select field mappings. Quick Mapping includes Same Row Mapping and Same Name Mapping. Same Name Mapping: Maps fields with the same name. Same Row Mapping: When the field names in the source and target tables are inconsistent, but data in corresponding rows needs to be mapped. Only maps fields in the same row.

Click OK to complete the property configuration of the PostgreSQL output component.