The StarRocks input component enables reading data from a StarRocks data source. When synchronizing data from a StarRocks data source to other destinations, it is necessary to configure the source read by the StarRocks input component before setting up the target data source for synchronization. This topic describes the configuration process for the StarRocks input component.
Prerequisites
A StarRocks data source has been created. For more information, see Create a StarRocks data source.
To configure the properties of the StarRocks input component, the account must possess read-through permission for the data source. Should you lack the necessary permissions, you must obtain them from the data source. For more information, see how to request, renew, and return data source permissions.
Procedure
On the Dataphin home page, navigate to the top menu bar and select Development > Data Integration.
At the top of the integration page, select Project (Dev-Prod mode requires selecting an environment).
In the left-side navigation pane, click on the Batch Pipeline. From the Batch Pipeline list, select the offline pipeline you want to develop to access its configuration page.
Click Component Library in the upper right corner to open the Component Library panel.
In the Component Library panel's left-side navigation pane, select Input. Locate the StarRocks component in the list on the right and drag it onto the canvas.
To configure the component, click the
icon on the StarRocks input component card, opening the Starrocks Input Configuration dialog box.In the Starrocks Input Configuration dialog box, set the following parameters.
Parameter
Description
Step Name
This is the name of the StarRocks input component. Dataphin automatically generates the step name, but you can modify it according to the business scenario. The naming convention is as follows:
It can only contain Chinese characters, letters, underscores (_), and numbers.
It cannot exceed 64 characters.
Datasource
The data source drop-down list displays all StarRocks type data sources in the current Dataphin, including data sources for which you have read-through permission and those for which you do not. Click the
icon to copy the current data source name.For data sources without read-through permission, you can click Request after the data source to request read-through permission for the data source. For specific operations, see Request data source permissions.
If you do not have a StarRocks type data source, click Create Data Source to create a data source. For more information, see Create a StarRocks data source.
Source Table Volume
Select the source table volume. The source table volume includes Single Table and Multiple Tables:
Single Table: Suitable for scenarios where business data from one table is synchronized to one target table.
Multiple Tables: Suitable for scenarios where business data from multiple tables is synchronized to the same target table. When data from multiple tables is written to the same data table, the union algorithm is used.
Table
Select the source table:
If Source Table Volume is set to Single Table, you can enter a table name keyword to search, or enter the exact table name and then click Precise Search. After selecting a table, the system will automatically perform table status detection. Click the
icon to copy the name of the currently selected table.If Source Table Volume is set to Multiple Tables, perform the following operations to add tables.
In the input box, enter the expression of the table to filter tables with the same structure.
The system supports enumeration form, class regular form, and mixed form of the two. For example,
table_[001-100];table_102.Click Precise Search to view the list of matched tables in the Confirm Match Details dialog box.
Click Confirm.
Shard Key (optional)
You can use a column of the integer type in the source data table as the shard key. It is recommended to use the primary key or a column with a index as the shard key. When reading data, data sharding is performed based on the configured shard key field to achieve concurrent reading, which can improve data synchronization efficiency.
Batch Read Count (optional)
The number of data entries read at one time. When reading data from the source database, you can configure a specific batch read count (such as 1024 records) instead of reading one by one to reduce the number of interactions with the data source, improve I/O efficiency, and reduce network latency.
Input Filter (optional)
Fill in the filter information for the input fields, such as
ds=${bizdate}. Input Filter is applicable to the following two scenarios:A fixed part of the data.
Parameter filtering.
Output Fields
The output fields area displays all fields of the selected table and the fields hit by the filter conditions. You can create new output fields or add output fields in batches. If you do not need to output certain fields to downstream components, you can also delete the corresponding fields.
Batch Add: Click Batch Add to support batch configuration in JSON, TEXT format, or DDL format.
NoteAfter batch addition is completed, clicking confirm will overwrite the configured field information.
Batch configuration in JSON format, for example:
// Example: [{ "name": "user_id", "type": "String" }, { "name": "user_name", "type": "String" }]Notename represents the name of the introduced field, and type represents the type of the field after introduction. For example,
"name":"user_id","type":"String"indicates that the field named user_id is introduced, and the field type is set to String.Batch configuration in TEXT format, for example:
// Example: user_id,String user_name,StringThe row delimiter is used to separate the information of each field. The default is a line feed (\n), and it supports line feed (\n), semicolon (;), and period (.).
The column delimiter is used to separate the field name and field type. The default is a comma (,).
Batch configuration in DDL format, for example:
CREATE TABLE tablename ( id INT PRIMARY KEY, name VARCHAR(50), age INT );
Create New Output Field: Click +create New Output Field and fill in Column and select Type according to the page prompts.
Delete Single Field: If you need to delete a small number of fields, you can click the Actions column of the target field in the output field list and click the
icon to delete the redundant fields.NoteWhen the compute engine is StarRocks, the output fields of the StarRocks input component support viewing the classification and grading of fields. Non-StarRocks compute engines do not support this.
Batch Delete Fields: If you need to delete many fields, you can click Field Management, select multiple fields in the Field Management dialog box, then click the
left move icon to move the selected input fields to the unselected input fields, and click Confirm to complete the batch deletion of fields.
Click Confirm to finalize the property configuration for the Starrocks Input Component.