All Products
Search
Document Center

Dataphin:Configure a PolarDB input component

Last Updated:Mar 05, 2026

The PolarDB input component reads data from a PolarDB data source. You must configure this component to synchronize data from PolarDB to another data source. This topic describes how to configure the component.

Prerequisites

Procedure

  1. On the Dataphin home page, choose Develop > Data Integration from the top menu bar.

  2. In the top menu bar of the integration page, select a project. In Dev-Prod mode, also select an environment.

  3. In the left navigation pane, click Offline Integration. In the Offline Integration list, click the offline pipeline that you want to develop to open its configuration page.

  4. In the upper-right corner of the page, click Component Library to open the Component Library panel.

  5. In the Component Library panel, select Input from the navigation pane on the left. Find the PolarDB component in the list of input components on the right and drag it to the canvas.

  6. Click the image icon on the PolarDB input component card to open the PolarDB Input Configuration dialog box.

  7. In the PolarDB Input Configuration dialog box, configure the parameters.

    Parameter

    Description

    Step Name

    The name of the PolarDB input component. Dataphin automatically generates a step name. You can also change it as needed. The naming convention is as follows:

    • You can only enter Chinese characters, letters, underscores (_), and numbers.

    • The name cannot exceed 64 characters in length.

    Datasource

    The drop-down list displays all PolarDB data sources, including those for which you have read-through permissions and those for which you do not. Click the image icon to copy the current data source name.

    Time Zone

    Time-formatted data is processed based on the current time zone. By default, this is the time zone configured in the selected data source and cannot be changed.

    Note

    For nodes created before version V5.1.2, you can select Data Source Default Configuration or Channel Configuration Time Zone. The default selection is Channel Configuration Time Zone.

    • Data Source Default Configuration: The default time zone of the selected data source.

    • Channel Configuration Time Zone: The time zone configured in Properties > Channel Configuration for the current integration node.

    Number of source tables

    Select the number of source tables for data synchronization. Options include Single Table and Multiple Tables:

    • Single Table: Use this for scenarios where business data from one table is synchronized to one target table.

    • Multiple Tables: Use this for scenarios where business data from multiple tables is synchronized to the same target table. When data from multiple tables is written to a single data table, the union algorithm is used.

    Table matching method

    Currently, you can only select General Rule.

    Note

    This parameter is available only when you select Multiple Tables for Number of source tables.

    Table

    Select the source table:

    • If you selected Single Table for Number of source tables, you can enter a keyword to search for the table, or enter the exact table name and click Exact Match. After you select a table, the system automatically checks its status. Click the image icon to copy the name of the selected table.

    • If you selected Multiple Tables for Number of source tables, add tables as follows:

      1. In the input box, enter an expression to filter for tables with the same structure.

        The system supports enumeration, regular expression-like patterns, and a mix of both. For example, table_[001-100];table_102.

      2. Click Exact Match. In the Confirm Match Details dialog box, view the list of matched tables.

      3. Click Confirm.

    Split Key (Optional)

    The system partitions data based on the configured split key. You can use this with the concurrency setting to read data concurrently. A column from the source table can be used as the split key. Use the primary key or an indexed column as the split key to ensure high performance.

    Important

    If you select a date and time type, the system identifies the minimum and maximum values and performs a brute-force split based on the total time range and concurrency. The splits are not guaranteed to be even.

    Batch Read Size (Optional)

    The number of data records to read at one time. When reading data from the source database, you can configure a specific batch read size, such as 1024 records, instead of reading one record at a time. This reduces the number of interactions with the data source, improves I/O efficiency, and lowers network latency.

    Input Filter (Optional)

    Enter the filter conditions for the input fields, for example, ds=${bizdate}. The input filter is suitable for the following two scenarios:

    • A fixed portion of the data.

    • Parameter-based filtering.

    Output Fields

    The Output Fields section displays all fields from the selected tables that match the filter criteria. You can perform the following operations:

    • Field Management: If you do not need to output certain fields to downstream components, you can delete them:

      • Deleting a single field: To delete a small number of fields, click the sgaga icon in the Actions column to remove the unwanted fields.

      • Deleting fields in batch: To delete many fields, click Field Management. In the Field Management dialog box, select multiple fields, click the image left arrow icon to move the selected input fields to the unselected input fields list, and then click OK to delete the fields in batch.

        image..png

    • Batch Add: Click Batch Add to configure fields in batch using JSON, TEXT, or DDL format.

      Note

      After you add fields in batch and click OK, the existing field information is overwritten.

      • To configure in batch using JSON format, for example:

        // Example:
          [{
             "index": 1,
             "name": "id",
             "type": "int(10)",
             "mapType": "Long",
             "comment": "comment1"
           },
           {
             "index": 2,
             "name": "user_name",
             "type": "varchar(255)",
             "mapType": "String",
             "comment": "comment2"
         }]
        Note

        The index field specifies the column number of the object. The name field specifies the name of the imported field. The type field specifies the data type of the imported field. For example, "index": 3, "name": "user_id", "type": "String" indicates that the fourth column in the file is imported as the field user_id of type String.

      • To configure in batch using TEXT format, for example:

        // Example:
        1,id,int(10),Long,comment1
        2,user_name,varchar(255),Long,comment2
        • The row delimiter separates the information for each field. The default is a line feed (\n). Semicolons (;) and periods (.) are also supported.

        • The column delimiter separates field names from field types. The default delimiter is a half-width comma (,). It supports','. The field type is optional and defaults to','.

      • To configure in batch using DDL format, for example:

        CREATE TABLE tablename (
        	user_id serial,
        	username VARCHAR(50),
        	password VARCHAR(50),
        	email VARCHAR (255),
        	created_on TIMESTAMP,
        );
    • Create Output Field: Click +Create Output Field. Follow the prompts to enter the Column, Type, and Comment, and select the Mapping Type. After you finish configuring the current row, click the image icon to save.

  8. Click Confirm to complete the configuration of the PolarDB input component.