Write Data to OceanBase with Batch Pipeline Output - Dataphin - Alibaba Cloud - Dataphin

The OceanBase output component writes data to an OceanBase data source. When synchronizing data from other sources to OceanBase, configure the target data source properties in the OceanBase output component after setting up the source data information.

Prerequisites

An OceanBase data source has been established. For more information, see Create an OceanBase Data Source.
To configure OceanBase output component properties, the account must have write-through permission for the data source. If you lack the required permissions, request them first. For more information, see Apply for, Renew, and Return Data Source Permissions.

Procedure

On the Dataphin home page, navigate to the top menu bar and select Development > Data Integration.
At the top menu bar of the integration page, select Project (in Dev-Prod mode, choose Environment).
In the navigation pane on the left, click Batch Pipeline. Then, in the Batch Pipeline list, select the offline pipeline you want to develop to access its configuration page.
Click Component Library in the upper right corner to open the Component Library panel.
In the Component Library panel's left-side navigation pane, select Output. Then, in the output component list on the right, locate the OceanBase component and drag it onto the canvas.
Connect the target input, transform, or flow component to the OceanBase output component by clicking and dragging the icon.
To open the OceanBase Output Configuration dialog box, click the icon on the OceanBase output component card.

In the Oceanbase Output Configuration dialog box, set the necessary parameters.

	Parameter	Description
Basic Settings	Step Name	The name of the OceanBase output component. Dataphin automatically generates the step name, which you can modify as needed. The naming convention is as follows: Can only contain Chinese characters, letters, underscores (_), and numbers. Cannot exceed 64 characters.
	Datasource	The drop-down list displays all OceanBase data sources, including those with and without write-through permission. Click the icon to copy the data source name. For data sources without write-through permission, click Request next to the data source to apply for permission. For more information, see Apply for, Renew, and Return Data Source Permissions. If no OceanBase data source exists, click Create Data Source to create one. For more information, see Create an OceanBase Data Source.
	Table	Select the target table for output data. You can enter a keyword to search, or enter an exact table name and click Precise Search. After you select a table, the system automatically checks the table status. Click the icon to copy the selected table name. The loading policy varies by OceanBase mode: Oracle mode only supports Append Mode. A primary key or constraint violation triggers a dirty data error. Otherwise, new data is appended directly. OceanBase's MySQL mode. Append Policy: New data is appended directly when no primary key or constraint violation occurs. Overwrite Policy: On a primary key or constraint violation, the new value overwrites the old value at the field level, affecting only the written fields.
	Batch Write Data Volume (optional)	The data volume written per batch. You can also set Batch Write Count. The system writes data when either limit is reached first. Default: 32 MB.
	Batch Write Count (optional)	Default: 2048 rows. Data synchronization uses a batch writing strategy controlled by Batch Write Count and Batch Write Data Volume. When the accumulated data reaches either limit (volume or count), the system considers the batch full and writes it to the target immediately. A recommended batch write data volume is 32 MB. Adjust the batch write count based on the actual record size, typically setting a larger value to maximize batch writing efficiency. For example, if each record is approximately 1 KB, you can set the batch write data volume to 16 MB and the batch write count to a value greater than 16 MB / 1 KB = 16,384 rows, such as 20000 rows. With this configuration, the system triggers a batch write whenever the accumulated data reaches 16 MB.
	Prepare Statement (optional)	An SQL script executed on the database before data import. For example, to maintain service availability: before writing, create target table Target_A and write data to it. After writing completes, rename the active table Service_B to Temp_C, rename Target_A to Service_B, and then delete Temp_C.
	End Statement (optional)	An SQL script executed on the database after data import.
Field Mapping	Input Fields	Input fields are populated based on the output of the upstream component.
	Output Fields	Configure the output fields. The following operations are available: Field Management: Click Field Management to select output fields. Click the icon to move Selected Input Fields to Unselected Input Fields. Click the icon to move Unselected Input Fields to Selected Input Fields. Batch addition: Click Batch Addition. JSON, TEXT format, and DDL format batch configuration are supported. Batch configuration in JSON format, for example: `// Example: [{ "name": "user_id", "type": "String" }, { "name": "user_name", "type": "String" }]` Note `name` specifies the field name, and `type` specifies the data type after import. For example, `"name":"user_id","type":"String"` imports the field named `user_id` with the String data type. Batch configuration in TEXT format, for example: `// Example: user_id,String user_name,String` The row delimiter is used to separate each field's information. The default is a line feed (\n), supporting line feed (\n), semicolon (;), or period (.). The column delimiter is used to separate the field name and field type, with the default being a comma (,). Batch configuration in DDL format, for example: `CREATE TABLE tablename ( id INT PRIMARY KEY, name VARCHAR(50), age INT );` Create Output Field: Click +create Output Field, fill in Column and select Type according to the page prompts. After completing the configuration of the current row, click the icon to save.
	Mapping	Map upstream input fields to target table fields manually. Mapping includes Row Mapping and Same Name Mapping. Same Name Mapping: Maps fields that share the same name. Row Mapping: Maps fields by row position when source and target field names differ.

Click Confirm to finalize the property configuration for the OceanBase Output Component.