Configuring the StarRocks Output Component - Dataphin - Alibaba Cloud Documentation Center

The StarRocks output component enables data writing to a StarRocks data source. When synchronizing data from various sources to StarRocks, configure the source information first, followed by the StarRocks output component to ensure data is written to the intended target. This topic describes the configuration process for the StarRocks output component.

Prerequisites

You have successfully created a StarRocks data source. For more information, see Create a StarRocks Data Source.
To configure the properties of the StarRocks output component, the account must possess write-through permissions for the data source. Should you lack these permissions, you will need to obtain them from the data source. For more information, see Request, Renew, and Return Data Source Permissions.

Stream Load Data Latency Description

Stream load imports into the StarRocks database may return various statuses, including a publish timeout. Although this indicates a successful task, it may result in a query delay. Monitor the status in the running log for:

Success: The import has completed successfully, and the data is immediately visible.
Publish Timeout: The import job is committed successfully, but the data may not be visible instantly. Do not retry as it is considered successful.
Label Already Exists: Another job has already used the label, which might be successful or still in progress.
Fail: The import has failed. Retry the job using the specified label.

Procedure

On the Dataphin home page, select Development > Data Integration from the top menu bar.
Select Project from the integration page's top menu bar (Dev-Prod mode requires selecting an environment).
In the navigation pane on the left, click on the Batch Pipeline option. From the Batch Pipeline list, select the offline pipeline you want to develop to access its configuration page.
To open the Component Library panel, click Component Library at the top right corner of the page.
In the Component Library panel's left-side navigation pane, select Output. Then, find the StarRocks component in the right-hand list and drag it onto the canvas.
Connect the target input component to the StarRocks output component by clicking and dragging the icon.
Open the Starrocks Output Configuration dialog box by clicking the icon on the StarRocks output component card.

Configure the necessary parameters in the Starrocks Output Configuration dialog box.

Parameter		Description
Basic Settings	Step Name	This is the name of the StarRocks output component. Dataphin automatically generates the step name, and you can also modify it based on the business scenario. The naming convention is as follows: Can only contain Chinese characters, letters, underscores (_), and numbers. Up to 64 characters in length.
	Datasource	In the data source drop-down list, all StarRocks-type data sources are displayed, including those for which you have write-through permission and those for which you do not. Click the icon to copy the current data source name. For data sources without write-through permission, you can click Request after the data source to request write-through permission for the data source. For more information, see Request Data Source Permissions. If you do not have a StarRocks-type data source, click Create Data Source to create a data source. For more information, see Create a StarRocks Data Source.
	Table	Select the target table for output data. You can enter a table name keyword to search or enter the exact table name and click Exact Search. After selecting a table, the system will automatically perform a table status check. Click the icon to copy the name of the currently selected table. If there is no target table for data synchronization in the StarRocks data source, you can use the one-click table creation feature to quickly generate a target table. Perform the following steps: Click One-click Table Creation. Dataphin will automatically match the code for creating the target table for you, including the target table name (default is the source table name), field types (based on the initial transformation of Dataphin fields), and other information. You can modify the SQL script for creating the target table based on the business situation, and then click Create. After the target table is successfully created, Dataphin automatically sets the newly created target table as the target table for output data. One-click table creation is used to create target tables for data synchronization in the development and production environments. Dataphin defaults to selecting the production environment for table creation. If there is already a data table with the same name and structure in the production environment, you do not need to select the production environment for table creation. Note If there is a table with the same name in the development or production environment, Dataphin will report an error indicating that the table already exists after clicking Create. If there are no matching items, integration based on manually entered table names is also supported. View selection is not supported in copy mode.
	Data Format	You can select CSV or JSON. If you select CSV, you also need to configure the CSV Import Column Delimiter and CSV Import Row Delimiter.
	CSV Import Column Delimiter (optional)	When you use StreamLoad to import CSV data, you can configure the column delimiter. The default value is `_@dp@_`. If you use the default value, do not specify it. If your data contains `_@dp@_`, you must use a different character as the delimiter.
	CSV Import Row Delimiter (optional)	When you use StreamLoad to import CSV data, you can configure the row delimiter. The default value is `_#dp#_`. If you use the default value, do not specify it. If your data contains `_#dp#_`, you must use a different character as the delimiter.
	Batch Write Data Volume (optional)	The size of the data volume written at one time. You can also set Batch Write Count. The system will write according to the limit reached first in the two configurations. The default is 32M.
	Batch Write Count (optional)	The default is 2048 entries. When data is written synchronously, a batch write strategy is used. The parameters set include Batch Write Count and Batch Write Data Volume. When the accumulated data volume reaches any of the set limits (that is, the batch write data volume or count limit), the system will consider a batch of data to be full and will immediately write this batch of data to the target end at once. It is recommended to set the batch write data volume to 32 MB. For the batch insert count limit, you can flexibly adjust it according to the actual size of a single record. Generally, set it to a larger value to fully utilize the advantages of batch writing. For example, if the size of a single record is about 1 KB, you can set the batch insert byte size to 16 MB. Considering this condition, set the batch insert count to be greater than the result of dividing 16 MB by the size of a single record (1 KB), which is greater than 16384 entries. Here, it is assumed to be set to 20000 entries. After this configuration, the system will trigger a batch write operation based on the batch insert byte size. Each time the accumulated data volume reaches 16 MB, a write action will be executed.
	Prepare Statement (optional)	The SQL script executed on the database before data import. For example, to ensure the continuous availability of the service, before the current step writes data, create the target table Target_A, execute the write to the target table Target_A, and after the current step writes data, rename the table Service_B, which continuously provides services in the database, to Temp_C. Then rename the table Target_A to Service_B, and finally delete Temp_C.
	End Statement (optional)	The SQL script executed on the database after data import.
Field Mapping	Input Field	Displays the input fields based on the output of the upstream component.
	Output Field	Displays the output fields. You can perform the following operations: Field Management: Click Field Management to select output fields. Click the icon to move the Selected Input Field to the Unselected Input Field. Click the icon to move the Unselected Input Field to the Selected Input Field. Batch addition: Click Batch Addition. JSON, TEXT, and DDL formats are supported for batch configuration. Batch configuration in JSON format, for example: `// Example: [{ "name": "user_id", "type": "String" }, { "name": "user_name", "type": "String" }]` Note name specifies the name of the imported field. type specifies the data type of the field after it is imported. For example, `"name":"user_id","type":"String"` imports the field named user_id and sets its data type to String. Batch configuration in TEXT format, for example: `// Example: user_id,String user_name,String` The row delimiter is used to separate each field's information. The default is a line feed (\n). It supports line feed (\n), semicolon (;), or period (.). The column delimiter is used to separate the field name and field type. The default is a comma (,). Batch configuration in DDL format, for example: `CREATE TABLE tablename ( id INT PRIMARY KEY, name VARCHAR(50), age INT );` Create New Output Field: Click +create New Output Field, fill in the Column and select Type as prompted on the page. After completing the configuration of the current row, click the icon to save.
	Mapping	Based on the upstream input and the target table's fields, you can manually select field mapping. Mapping includes Row Mapping and Name Mapping. Name Mapping: Maps fields with the same field name. Row Mapping: The field names of the source table and target table are inconsistent, but the data in the corresponding rows of the fields needs to be mapped. Only fields in the same row are mapped.

Complete the property configuration of the StarRocks Output Component by clicking Confirm.