Sync Data to openGauss via Output Component Setup - Dataphin

The openGauss output component writes data to an openGauss data source. When syncing data from other data sources to an openGauss data source, configure the openGauss output component after you finish configuring the source data source. This topic describes how to configure the openGauss output component.

Prerequisites

You have created an openGauss data source. For more information, see Create an openGauss data source.
The account used to configure the openGauss output component properties must have sync-write permission on the data source. If the account does not have this permission, request it. For more information, see Request data source permissions.

Procedure

In the top menu bar on the Dataphin homepage, choose Develop > Data Integration.
In the top menu bar on the Data Integration page, select Project. In Dev-Prod mode, select Environment instead.
In the navigation pane on the left, click Offline Integration. In the Offline Integration list, click the offline pipeline that you want to develop to open its configuration page.
In the upper-right corner of the page, click Component Library to open the Component Library panel.
In the navigation pane on the left of the Component Library panel, click Output. In the output component list on the right, find the openGauss component and drag it onto the canvas.
Click and drag the icon of a target input, transform, or flow component to connect it to the openGauss output component.
Click the icon in the openGauss output component card to open the openGauss Output Configuration dialog box.

In the openGauss Output Configuration dialog box, configure the parameters.

Parameter		Description
Basic Settings	Step Name	The name of the openGauss output component. Dataphin generates a step name automatically. You can also rename it based on your business scenario. Naming rules: Use only Chinese characters, letters, underscores (_), and digits. Do not exceed 64 characters.
	Datasource	The drop-down list shows all openGauss data sources, including those for which you have sync-write permission and those for which you do not. Click the icon to copy the current data source name. If you do not have sync-write permission for a data source, click Request next to the data source to request sync-write permission. For more information, see Request data source permissions. If you do not have an openGauss data source, click Create Data Source to create one. For more information, see Create an openGauss data source.
	Schema	A schema is a logical grouping of tables in a data source.
	Table	Select the target table to write data to. Note The view option is not supported in copy mode. If the target table does not exist in the openGauss data source, use the one-click target table generation feature to create it quickly. Follow these steps: Click Generate Target Table. Dataphin automatically generates the SQL script to create the target table, including the table name (default: source table name) and field types (converted from Dataphin fields). See the following figure: Modify the generated SQL script as needed, then click Create. After the target table is created, Dataphin uses it as the output target table. Note If a table with the same name exists in the development environment, clicking Create returns an error that the table already exists.
	Production Table Missing Policy	Choose how to handle missing production tables. Options are No Action and Auto-create. Default: Auto-create. If you choose No Action, Dataphin does not create the production table when publishing the task. If you choose Auto-create, Dataphin creates a table with the same name in the target environment when publishing the task. No Action: If the target table does not exist, Dataphin shows an error during submission but still lets you publish the task. You must manually create the target table in the production environment before running the task. Automatic Creation: You must Edit The DDL Statement, which is pre-filled by default with the DDL statement for the selected table. You can adjust it. The table name in the DDL statement uses the placeholder `${table_name}`. Only this placeholder is supported, and it is replaced with the actual table name during execution. If the target table does not exist, Dataphin first runs the CREATE TABLE statement. If table creation fails, the publish check fails. Fix the CREATE TABLE statement based on the error message, then republish. If the target table already exists, Dataphin skips table creation. Note This setting is available only for projects in Dev-Prod mode.
	Loading Policy	Select the policy for writing data to the target table. The Loading Policy includes: Overwrite: Replace existing data in the target table with data from the current source table. Append: Add new data to the existing data in the target table without changing historical data. Copy: Copy data between tables and files. When conflicts occur, resolve them using the Conflict Resolution Policy to either Fail on Conflict or Overwrite on Conflict.
	Bulk Write Size	The size of data written in one batch. You can also set Bulk Write Count. The system writes data when either limit is reached. Default: 32 MB.
	Bulk Write Count	Default: 2,048 rows. During data sync, Dataphin batches data before writing. Parameters include Bulk Write Count and Bulk Write Size. When the accumulated data volume reaches a configured limit (either the batch data volume or the record count), the system considers the batch full and immediately writes it to the destination in a single operation. We recommend setting the bulk write size to 32 MB. Adjust the bulk write count based on the average row size. Use a larger value to maximize batch efficiency. For example, if each row is about 1 KB, set the bulk write size to 16 MB and the bulk write count to more than 16,384 rows (16 MB ÷ 1 KB). Here, we use 20,000 rows. With this setting, Dataphin triggers batch writes when the accumulated data reaches 16 MB.
Field Mapping	Input Fields	Lists input fields from upstream components.
	Output Fields	Lists output fields. Supported actions: Field Management: Click Field Management to select output fields. Click the icon to move Selected Input Fields to Unselected Input Fields. Click the icon to move Unselected Input Fields to Selected Input Fields. Batch Add: Click Batch Add to configure fields in JSON, TEXT, or DDL format. Batch configuration in JSON format, such as: `// Example: [{ "name": "user_id", "type": "String" }, { "name": "user_name", "type": "String" }]` Note `name` specifies the name of the imported field, and `type` specifies its data type. For example, `"name":"user_id","type":"String"` indicates that the field named `user_id` is imported and its data type is set to `String`. TEXT format example: `// Example: user_id,String user_name,String` Row delimiter separates field entries. Default: line feed (\n). Supported delimiters: \n, semicolon (;), and period (.). Column delimiter separates field names from field types. Default: comma (,). Batch configuration in DDL format, such as: `CREATE TABLE tablename ( id INT PRIMARY KEY, name VARCHAR(50), age INT );` Create Output Field: Click + Create Output Field. Enter the Column name and select the Type. Click the icon to save the row.
	Mapping	You can manually select field mappings based on the upstream input and the fields of the target table. Quick Mapping includes Map By Row and Map By Name. Name Mapping: Maps fields with identical names. Row Mapping: Maps fields in the same row when source and target column names differ.

Click OK to complete the configuration of the openGauss output component.