All Products
Search
Document Center

Dataphin:Configure the StarRocks input component

Last Updated:May 28, 2025

The StarRocks input component enables reading data from a StarRocks data source. When synchronizing data from a StarRocks data source to other destinations, it is necessary to configure the source read by the StarRocks input component before setting up the target data source for synchronization. This topic describes the configuration process for the StarRocks input component.

Prerequisites

Procedure

  1. On the Dataphin home page, navigate to the top menu bar and select Development > Data Integration.

  2. At the top of the integration page, select Project (Dev-Prod mode requires selecting an environment).

  3. In the left-side navigation pane, click on the Batch Pipeline. From the Batch Pipeline list, select the offline pipeline you want to develop to access its configuration page.

  4. Click Component Library in the upper right corner to open the Component Library panel.

  5. In the Component Library panel's left-side navigation pane, select Input. Locate the StarRocks component in the list on the right and drag it onto the canvas.

  6. To configure the component, click the image icon on the StarRocks input component card, opening the Starrocks Input Configuration dialog box.

  7. In the Starrocks Input Configuration dialog box, set the following parameters.

    Parameter

    Description

    Step Name

    This is the name of the StarRocks input component. Dataphin automatically generates the step name, but you can modify it according to the business scenario. The naming convention is as follows:

    • It can only contain Chinese characters, letters, underscores (_), and numbers.

    • It cannot exceed 64 characters.

    Datasource

    The data source drop-down list displays all StarRocks type data sources in the current Dataphin, including data sources for which you have read-through permission and those for which you do not. Click the image icon to copy the current data source name.

    • For data sources without read-through permission, you can click Request after the data source to request read-through permission for the data source. For specific operations, see Request data source permissions.

    • If you do not have a StarRocks type data source, click Create Data Source to create a data source. For more information, see Create a StarRocks data source.

    Source Table Volume

    Select the source table volume. The source table volume includes Single Table and Multiple Tables:

    • Single Table: Suitable for scenarios where business data from one table is synchronized to one target table.

    • Multiple Tables: Suitable for scenarios where business data from multiple tables is synchronized to the same target table. When data from multiple tables is written to the same data table, the union algorithm is used.

    Table

    Select the source table:

    • If Source Table Volume is set to Single Table, you can enter a table name keyword to search, or enter the exact table name and then click Precise Search. After selecting a table, the system will automatically perform table status detection. Click the image icon to copy the name of the currently selected table.

    • If Source Table Volume is set to Multiple Tables, perform the following operations to add tables.

      1. In the input box, enter the expression of the table to filter tables with the same structure.

        The system supports enumeration form, class regular form, and mixed form of the two. For example, table_[001-100];table_102.

      2. Click Precise Search to view the list of matched tables in the Confirm Match Details dialog box.

      3. Click Confirm.

    Shard Key (optional)

    You can use a column of the integer type in the source data table as the shard key. It is recommended to use the primary key or a column with a index as the shard key. When reading data, data sharding is performed based on the configured shard key field to achieve concurrent reading, which can improve data synchronization efficiency.

    Batch Read Count (optional)

    The number of data entries read at one time. When reading data from the source database, you can configure a specific batch read count (such as 1024 records) instead of reading one by one to reduce the number of interactions with the data source, improve I/O efficiency, and reduce network latency.

    Input Filter (optional)

    Fill in the filter information for the input fields, such as ds=${bizdate}. Input Filter is applicable to the following two scenarios:

    • A fixed part of the data.

    • Parameter filtering.

    Output Fields

    The output fields area displays all fields of the selected table and the fields hit by the filter conditions. You can create new output fields or add output fields in batches. If you do not need to output certain fields to downstream components, you can also delete the corresponding fields.

    • Batch Add: Click Batch Add to support batch configuration in JSON, TEXT format, or DDL format.

      Note

      After batch addition is completed, clicking confirm will overwrite the configured field information.

      • Batch configuration in JSON format, for example:

        // Example:
        [{
          "name": "user_id",
          "type": "String"
         },
         {
          "name": "user_name",
          "type": "String"
         }]
        Note

        name represents the name of the introduced field, and type represents the type of the field after introduction. For example, "name":"user_id","type":"String" indicates that the field named user_id is introduced, and the field type is set to String.

      • Batch configuration in TEXT format, for example:

        // Example:
        user_id,String
        user_name,String
        • The row delimiter is used to separate the information of each field. The default is a line feed (\n), and it supports line feed (\n), semicolon (;), and period (.).

        • The column delimiter is used to separate the field name and field type. The default is a comma (,).

      • Batch configuration in DDL format, for example:

        CREATE TABLE tablename (
            id INT PRIMARY KEY,
            name VARCHAR(50),
            age INT
        );
    • Create New Output Field: Click +create New Output Field and fill in Column and select Type according to the page prompts.

    • Delete Single Field: If you need to delete a small number of fields, you can click the Actions column of the target field in the output field list and click the sgaga icon to delete the redundant fields.

      Note

      When the compute engine is StarRocks, the output fields of the StarRocks input component support viewing the classification and grading of fields. Non-StarRocks compute engines do not support this.

    • Batch Delete Fields: If you need to delete many fields, you can click Field Management, select multiple fields in the Field Management dialog box, then click the image left move icon to move the selected input fields to the unselected input fields, and click Confirm to complete the batch deletion of fields.

      image..png

  8. Click Confirm to finalize the property configuration for the Starrocks Input Component.