Sync Data to TiDB with Accurate Output Component Config - Dataphin

The TiDB output component writes data to a TiDB data source. When synchronizing data from other data sources to a TiDB data source, you need to configure the target data source for the TiDB output component after configuring the source data source information. This topic describes how to configure the TiDB output component.

Prerequisites

You have created a TiDB data source. For more information, see Create a TiDB data source.
The account used to configure the TiDB output component properties has write-through permission for the data source. If you do not have the permission, you need to request it. For more information, see Request, renew, and return data source permissions.

Procedure

In the top navigation bar of the Dataphin homepage, choose Development > Data Integration.
In the top navigation bar of the integration page, select Project (In Dev-Prod mode, you need to select Environment).
In the left navigation pane, click Batch Pipeline, and then click the offline pipeline that you want to develop in the Batch Pipeline list to open the configuration page of the offline pipeline.
Click Component Library in the upper-right corner of the page to open the Component Library panel.
In the left navigation pane of the Component Library panel, select Outputs, find the TiDB component in the output component list on the right, and drag the component to the canvas.
Click and drag the icon of the target input, transform, or flow component to connect it to the current TiDB output component.
Click the icon on the TiDB output component to open the TiDB Output Configuration dialog box.

In the TiDB Output Configuration dialog box, configure the parameters.

Parameter		Description
Basic Settings	Step Name	The name of the TiDB output component. Dataphin automatically generates a step name, which you can modify based on your business scenario. The name must meet the following requirements: It can contain only Chinese characters, letters, underscores (_), and digits. It cannot exceed 64 characters in length.
	Datasource	The data source dropdown list displays all TiDB data sources, including those for which you have the write-through permission and those for which you do not. Click the icon to copy the current data source name. For data sources for which you do not have the write-through permission, you can click Request next to the data source to request the write-through permission. For more information, see Request data source permissions. If you do not have a TiDB data source, click Create Data Source to create one. For more information, see Create a TiDB data source.
	Table	Select the target table for output data. You can enter a keyword to search for tables, or enter the exact table name and click Exact Match. After you select a table, the system automatically checks the table status. Click the icon to copy the name of the selected table.
	Loading Policy	Select the policy for writing data to the target table. Loading Policy includes: Append Data (insert Into): Appends data to the existing data in the target table without modifying historical data. When a primary key or constraint violation occurs, a dirty data error is reported. Overwrite On Primary Key Conflict (replace Into): When a primary key or constraint violation occurs, the system first deletes the entire row of old data with the duplicate primary key, and then inserts the new data. Update On Primary Key Conflict (on Duplicate Key Update): When a primary key or constraint violation occurs, the system updates the data of the mapped fields on the existing record.
	Batch Write Data Size (optional)	The size of data to be written at a time. You can also set Batch Write Count. The system writes data when either of the two limits is reached. The default value is 32M.
	Batch Write Count (optional)	The default value is 2048 records. When data is synchronized and written, the batch write strategy is used, with parameters including Batch Write Count and Batch Write Data Size. When the accumulated data reaches either of the set limits (batch write data size or batch write count), the system considers a batch of data to be full and immediately writes this batch of data to the target end at once. It is recommended to set the batch write data size to 32MB. For the batch write count limit, you can adjust it flexibly based on the actual size of a single record, usually setting it to a larger value to fully utilize the advantages of batch writing. For example, if the size of a single record is about 1KB, you can set the batch write data size to 16MB, and considering this condition, set the batch write count to a value greater than the result of 16MB divided by the single record size of 1KB (i.e., greater than 16384 records), assuming here it is set to 20000 records. With this configuration, the system will trigger batch write operations based on the batch write data size, executing a write operation whenever the accumulated data reaches 16MB.
	Prepare Statement (optional)	The SQL script to be executed on the database before data import. For example, to ensure continuous service availability, before the current step writes data, it first creates a target table Target_A, then executes writing to Target_A. After the current step completes writing data, it renames the continuously serving table Service_B to Temp_C, then renames table Target_A to Service_B, and finally deletes Temp_C.
	End Statement (optional)	The SQL script to be executed on the database after data import.
Field Mapping	Input Fields	Displays the input fields based on the output of the upstream component.
	Output Fields	Displays the output fields. You can perform the following operations: Field Management: Click Field Management to select output fields. Click the icon to move Selected Input Fields to Unselected Input Fields. Click the icon to move Unselected Input Fields to Selected Input Fields. Batch Add: Click Batch Add to configure in JSON, TEXT, or DDL format. Configure in JSON format, for example: `// Example: [{ "name": "user_id", "type": "String" }, { "name": "user_name", "type": "String" }]` Note name specifies the name of the field to import, and type specifies the data type of the field after it is imported. For example, `"name":"user_id","type":"String"` imports the field named user_id and sets its data type to String. Configure in TEXT format, for example: `// Example: user_id,String user_name,String` The row delimiter is used to separate the information of each field, with the default being a line feed (\n). It supports line feed (\n), semicolon (;), and period (.). The column delimiter is used to separate the field name and field type, with the default being a comma (,). Configure in DDL format, for example: `CREATE TABLE tablename ( id INT PRIMARY KEY, name VARCHAR(50), age INT );` Create New Output Field: Click +Create New Output Field, fill in the Column and select the Type as prompted. After completing the configuration for the current row, click the icon to save.
	Mapping	Based on the upstream input and the fields of the target table, you can manually select field mappings. Quick Mapping includes Same Row Mapping and Same Name Mapping. Same Name Mapping: Maps fields with the same name. Same Row Mapping: Maps fields in the same row when the field names in the source and target tables are different but the data in the corresponding rows needs to be mapped.

Click OK to complete the property configuration of the TiDB output component.