The Greenplum output component writes data from an external database to Greenplum. You can also use this component to copy and push data from a storage system connected to the big data platform to Greenplum for integration and reprocessing. This topic describes how to configure the Greenplum output component.
Prerequisites
A Greenplum data source has been created. For more information, see Create a Greenplum data source.
The account that you use to configure the Greenplum output component must have read-through permission for the data source. If you do not have this permission, request it for the data source. For more information, see Request data source permissions.
Procedure
On the Dataphin home page, choose Development > Data Integration from the top menu bar.
From the top menu bar of the integration page, select a Project. If you are in Dev-Prod mode, select an Environment.
In the navigation pane on the left, click Batch Pipeline. In the Batch Pipeline list, click the desired offline pipeline to open its configuration page.
In the upper-right corner of the page, click Component Library to open the Component Library panel.
In the navigation pane on the left of the Component Library panel, select Output. Find the Greenplum component in the output component list on the right and drag it to the canvas.
Click and drag the
icon of the source input, transform, or flow component to connect it to the Greenplum output component.Click the
icon on the Greenplum output component card to open the Greenplum Output Configuration dialog box.
In the Greenplum Output Configuration dialog box, configure the parameters.
Parameter
Description
Basic Settings
Step Name
The name of the Greenplum output component. Dataphin automatically generates a name. You can also change it as needed. The naming conventions are as follows:
Can contain only Chinese characters, letters, underscores (_), and digits.
Cannot exceed 64 characters in length.
Datasource
The drop-down list displays all Greenplum data sources. This includes data sources for which you have write-through permission and those for which you do not.
For data sources without write-through permission, click Request next to the data source to request the permission. For more information, see Request data source permissions.
If you do not have a Greenplum data source, click the
icon to create one. For more information, see Create a Greenplum data source.
Schema
Select a schema from the database. This parameter is required. If the data source connection string already contains schema information, the schema is used by default. You can also select another schema for which you have permission.
Table
Select the destination table for the output data.
If a destination table for data synchronization does not exist in the Greenplum data source, you can use the one-click table creation feature to quickly create one. The procedure is as follows:
Click One-click Table Creation. Dataphin automatically generates the code to create the target table for you, including the target table name (default is the source table name), field types (initially transformed based on Dataphin fields), and other information. As shown in the following figure:

Modify the SQL script to create the destination table as needed, and then click Create.
After the destination table is created, Dataphin automatically sets it as the destination table for the output data. The one-click table creation feature is used to create destination tables for data synchronization in the development and production environments. By default, Dataphin selects the option to create a table in the production environment. If a table with the same name and structure already exists in the production environment, you do not need to select this option.
NoteIf a table with the same name exists in the development or production environment, Dataphin reports an error after you click Create.
Loading Policy
Select an option. Valid values are Append Data and copy.
Append Data: If a primary key conflict or constraint violation occurs, a dirty data error is reported.
copy policy: An action is performed based on the selected conflict resolution policy. This policy is supported only for tables, not for views.
Conflict Resolution Policy
This parameter is required when you set Loading Policy to copy. Greenplum supports only Error On Conflict.
Batch Write Data Volume (Optional)
The amount of data to write in a single batch. You can also set Batch Write Count. The system writes data when the specified data amount or number of rows is reached, whichever comes first. Default value: 32 MB.
Batch Write Count (Optional)
Default value: 2048. During data synchronization, a batch write policy is used. The parameters include Batch Write Count and Batch Write Data Volume.
When the accumulated data reaches either of the specified limits (data volume or row count), the system considers the batch full and immediately writes the batch to the destination.
Set the batch write data volume to 32 MB. Adjust the maximum number of rows for a bulk insert based on the size of a single record. Set this parameter to a large value to maximize the benefits of batch writing. For example, if a single record is about 1 KB, you can set the batch insert size to 16 MB. In this case, set the batch insert count to a value greater than 16 MB divided by 1 KB (16,384). For example, set the value to 20,000. With this configuration, the system triggers a batch write operation based on the batch insert size. A write operation is performed each time the accumulated data volume reaches 16 MB.
Preparation Statement (Optional)
The SQL script to execute on the database before the data import.
For example, to ensure continuous service availability, you can create a destination table named Target_A before the current step writes data. Then, write data to Target_A. After the current step finishes writing data, rename the service table Service_B to Temp_C, rename Target_A to Service_B, and then delete Temp_C.
End Statement (Optional)
The SQL script to execute on the database after the data import.
Field Mapping
Input Field
Displays the input fields based on the output of the upstream component.
Output Field
Displays the output fields. You can perform the following operations:
Field Management: Click Field Management to select output fields.

Click the
icon to move fields from Selected Input Fields to Unselected Input Fields.Click the
icon to move fields from Unselected Input Fields to Selected Input Fields.
Batch Add: Click Batch Add to configure fields in batches. The JSON, TEXT, and DDL formats are supported.
To configure fields in JSON format:
// Example: [{ "name": "user_id", "type": "String" }, { "name": "user_name", "type": "String" }]NoteThe `name` parameter specifies the name of the imported field. The `type` parameter specifies the data type of the field after it is imported. For example,
"name":"user_id","type":"String"specifies that the field named user_id is imported and its data type is set to String.To configure fields in TEXT format:
// Example: user_id,String user_name,StringThe row delimiter separates the information of each field. The default delimiter is a line feed (`\n`). You can also use a semicolon (`;`) or a period (`.`).
The column delimiter separates the field name and the field type. The default delimiter is a comma (`,`).
To configure fields in DDL format:
CREATE TABLE tablename ( id INT PRIMARY KEY, name VARCHAR(50), age INT );
Create New Output Field: Click +Create New Output Field. Enter a Column name and select a Type. After you configure the current row, click the
icon to save the settings.
Field Mapping
Manually select mappings between the upstream input fields and the destination table fields. Quick Mapping provides the following options: Row Mapping and Name Mapping.
Name Mapping: Maps fields that have the same name.
Row Mapping: Maps fields that are in the same row. Use this option when the field names in the source and destination tables are different, but the data in the corresponding rows needs to be mapped.
Click Confirm to save the configuration.