All Products
Search
Document Center

Dataphin:Configure DataHub Input Components

Last Updated:May 28, 2025

After configuring the DataHub input components, you can read data from DataHub data sources into the storage system connected to the big data platform and perform data integration and secondary processing. This topic describes how to configure the DataHub input components.

Prerequisites

  • A DataHub data source has been created. For more information, see Create a DataHub Data Source.

  • To configure the properties of the DataHub input components, the account must have read-through permission for the data source. If permission is lacking, you need to obtain permission for the data source. For more information, see Request Data Source Permission.

Procedure

  1. On the Dataphin home page, select Development > Data Integration from the top menu bar.

  2. In the top menu bar of the integration page, select Project (Dev-Prod mode requires selecting an environment).

  3. In the left-side navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that needs to be developed to open its configuration page.

  4. Click the Component Library in the upper right corner of the page to open the Component Library panel.

  5. In the left-side navigation pane of the Component Library panel, select Input, find the DataHub component in the input component list on the right, and drag the component to the canvas.

  6. Click the image icon in the DataHub input component card to open the DataHub Input Configuration dialog box.

  7. In the DataHub Input Configuration dialog box, configure the parameters according to the following table.

    Parameter

    Description

    Step Name

    This is the name of the DataHub input component. Dataphin automatically generates the step name, and you can also modify it according to the business scenario. The naming convention is as follows:

    • Can only contain Chinese characters, letters, underscores (_), and numbers.

    • Cannot exceed 64 characters.

    Datasource

    The data source drop-down list displays all DataHub-type data sources in the current Dataphin, including data sources for which you have read-through permission and those for which you do not. Click the image icon to copy the current data source name.

    Subject

    The name of the DataHub topic. Click the drop-down list to select the DataHub subject name you need to read.

    Consumption Start Time

    The offset from which data consumption starts. Only supports specifying a specific time with a time string in the yyyyMMddHHmmss format, which is the left border of the time range. It needs to be used with schedule parameters. For example, if the schedule parameter is configured as startTime=${20220101000000}, then the Consumption Start Time is configured as ${startTime}.

    Consumption End Time

    The offset from which data consumption ends. Only supports specifying a specific time with a time string in the yyyyMMddHHmmss format, which is the right border of the time range. It needs to be used with schedule parameters. For example, if the schedule parameter is configured as endTime=${20220101000000}, then the Consumption End Time is configured as ${endTime}.

    Batch Read Count

    The number of records read at one time. When reading data from the source database, you can configure a specific batch read count (such as 1024 records) instead of reading one by one to reduce the number of interactions with the data source, improve I/O efficiency, and reduce network latency.

    Output Fields

    The output fields area displays all fields hit by the selected table and filter criteria. If you do not need to output certain fields to downstream components, you can delete the corresponding fields:

    • Single Field Deletion Scenario: If you need to delete a small number of fields, you can click the sgaga icon under the operation column to delete the extra fields.

    • Batch Field Deletion Scenario: If you need to delete many fields, you can click Field Management, select multiple fields in the Field Management dialog box, then click the image shift left icon to move the selected input fields to the unselected input fields and click OK to complete the batch field deletion. image..png

  8. Click OK to complete the DataHub input component configuration.