All Products
Search
Document Center

Dataphin:Configure StarRocks Input Component

Last Updated:Mar 09, 2026

The StarRocks input component retrieves data from StarRocks data sources. To synchronize data from a StarRocks data source to another data source, first configure the StarRocks data source that the input component reads from. Then, configure the destination data source for the synchronization. This topic describes how to configure the StarRocks input component.

Prerequisites

Procedure

  1. On the Dataphin homepage, in the top menu bar, choose Development > Data Integration.

  2. On the Integration page, in the top menu bar, select Project. If your project is in Dev-Prod mode, select an environment.

  3. In the navigation pane on the left, you can click Offline Integration. In the Offline Integration list, you can click the offline pipeline you want to develop to open its configuration page.

  4. Click Component Library in the upper-right corner of the page to open the Component Library panel.

  5. In the left navigation pane of the Component Library panel, select Input. In the input component list on the right, locate the StarRocks component and drag it to the canvas.

  6. Click the image icon on the StarRocks input component card to open the StarRocks Input Configuration dialog box.

  7. In the StarRocks Input Configuration dialog box, configure the following parameters.

    Parameter

    Description

    Step Name

    The name of the StarRocks input component. Dataphin automatically generates the step name. You can modify it as needed. Naming conventions are as follows:

    • Can contain only Chinese characters, letters, underscores (_), and numbers.

    • Cannot exceed 64 characters.

    Datasource

    The data source drop-down list displays all StarRocks data sources in Dataphin, including those for which you have read-through permissions and those for which you do not. Click the image icon to copy the current data source name.

    • For data sources without read-through permissions, click Request next to the data source to request read-through permissions. For specific operations, see Request Data Source Permissions.

    • If you do not have a StarRocks data source, click Create Data Source to create one. For more information, see Create a StarRocks Data Source.

    Source Table Quantity

    Select the source table volume. The source table volume options are Single Table and Multiple Tables.

    • Single Table: Synchronizes business data from a single table to a single target table.

    • Multiple tables: Applies to scenarios where business data from multiple tables is synchronized to the same destination table. When writing data from multiple tables to the same data table, the union algorithm is used.

    Table Matching Method

    You can select General-Purpose Rules or Database Regular Expression.

    Note

    This option is configurable only when **Source Table Quantity** is set to **Multiple tables**.

    Table

    Select the source table:

    • If **Source Table Quantity** is set to **Non-partitioned table**, enter table name keywords to search, or enter the exact table name and click Precise Search. After selecting a table, the system automatically detects the table status. Click the image icon to copy the name of the selected table.

    • If **Source Table Quantity** is set to **Multiple tables**, you can add tables by entering different expressions based on the table matching method.

      • If **Table Matching Method** is set to **General Rules**: In the input box, enter a table expression to filter for **tables with the same structure**. The system supports enumeration, regular expression-like, and mixed forms. For example, table_[001-100];table_102;.

      • If **Table Matching Method** is set to **Database Regular Expression**: In the input box, enter the regular expression supported by the current database. The system will match tables in the destination database based on this regular expression. During runtime, the task will instantly match new table ranges for synchronization based on the database regular expression.

      After entering the expression, click Precise Search to view the list of matched tables in the Confirm Match Details dialog box.

    Shard Key (Optional)

    You can use a column with an **integer** field type in the source data table as the shard key. It is recommended to use a **primary key** or an **indexed column** as the shard key. When reading data, data partitioning is performed based on the configured shard key field to enable concurrent reading, which improves data synchronization efficiency.

    Batch Read Count (Optional)

    The number of data records read at one time. When reading data from the source database, you can configure a specific batch read count (such as 1024 records) instead of reading records one by one. This reduces the number of interactions with the data source, improves I/O efficiency, and lowers network latency.

    Input Filter (Optional)

    Enter filter information for input fields, such as ds=${bizdate}. **Input Filter** applies to the following two scenarios:

    • A fixed subset of data.

    • Parameter filtering.

    Output Fields

    The output fields area displays all fields from the selected table and those matched by the filter conditions. You can create new output fields or add them in batches. If you do not need to output certain fields to downstream components, you can delete them.

    • Batch Add: Click Batch Add to support batch configuration in JSON, TEXT, and DDL formats.

      Note

      After batch adding is complete, clicking **OK** will overwrite the configured field information.

      • Configure in batches using JSON format, for example:

        // Example:
        [{
          "name": "user_id",
          "type": "String"
         },
         {
          "name": "user_name",
          "type": "String"
         }]
        Note

        Name indicates the imported field name, and type indicates the field type after import. For example, "name":"user_id","type":"String" means to import the field named user_id and set its field type to String.

      • Configure in batches using TEXT format, for example:

        // Example:
        user_id,String
        user_name,String
        • The row delimiter separates information for each field. The default is a line feed (\\n). It supports line feed (\\n), semicolon (;), and period (.).

        • The column delimiter separates the field name and field type. The default is a comma (,).

      • Configure in batches using DDL format, for example:

        CREATE TABLE tablename (
            id INT PRIMARY KEY,
            name VARCHAR(50),
            age INT
        );
    • Create Output Field: Click +Create Output Field, and enter the Column and select the Type as prompted on the page.

    • Delete a Field Individually: To delete a few fields, click the sgaga icon in the Actions column for the target field in the output field list to delete extra fields.

      Note

      When the compute engine is StarRocks, the output fields of the StarRocks input component support viewing field **classification and grading**. Non-StarRocks compute engines do not support this.

    • Batch Delete Fields: To delete many fields, click Field Management. In the Field Management dialog box, select multiple fields, then click the image left-arrow icon to move the selected input fields to the unselected input fields, and click Confirm to complete the batch deletion of fields.

      image..png

  8. Click Confirm to complete the property configuration for the StarRocks input component.