After configuring the DataHub input components, you can read data from DataHub data sources into the storage system connected to the big data platform and perform data integration and secondary processing. This topic describes how to configure the DataHub input components.
Prerequisites
A DataHub data source has been created. For more information, see Create a DataHub Data Source.
To configure the properties of the DataHub input components, the account must have read-through permission for the data source. If permission is lacking, you need to obtain permission for the data source. For more information, see Request Data Source Permission.
Procedure
On the Dataphin home page, select Development > Data Integration from the top menu bar.
In the top menu bar of the integration page, select Project (Dev-Prod mode requires selecting an environment).
In the left-side navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that needs to be developed to open its configuration page.
Click the Component Library in the upper right corner of the page to open the Component Library panel.
In the left-side navigation pane of the Component Library panel, select Input, find the DataHub component in the input component list on the right, and drag the component to the canvas.
Click the
icon in the DataHub input component card to open the DataHub Input Configuration dialog box.In the DataHub Input Configuration dialog box, configure the parameters according to the following table.
Parameter
Description
Step Name
This is the name of the DataHub input component. Dataphin automatically generates the step name, and you can also modify it according to the business scenario. The naming convention is as follows:
Can only contain Chinese characters, letters, underscores (_), and numbers.
Cannot exceed 64 characters.
Datasource
The data source drop-down list displays all DataHub-type data sources in the current Dataphin, including data sources for which you have read-through permission and those for which you do not. Click the
icon to copy the current data source name.For data sources without read-through permission, you can click Request after the data source to request read-through permission for the data source. For more information, see Request, Renew, and Return Data Source Permission.
If you do not have a DataHub-type data source yet, click Create to create a data source. For more information, see Create a DataHub Data Source.
Subject
The name of the DataHub topic. Click the drop-down list to select the DataHub subject name you need to read.
Consumption Start Time
The offset from which data consumption starts. Only supports specifying a specific time with a time string in the
yyyyMMddHHmmssformat, which is the left border of the time range. It needs to be used with schedule parameters. For example, if the schedule parameter is configured asstartTime=${20220101000000}, then the Consumption Start Time is configured as${startTime}.Consumption End Time
The offset from which data consumption ends. Only supports specifying a specific time with a time string in the yyyyMMddHHmmss format, which is the right border of the time range. It needs to be used with schedule parameters. For example, if the schedule parameter is configured as
endTime=${20220101000000}, then the Consumption End Time is configured as${endTime}.Batch Read Count
The number of records read at one time. When reading data from the source database, you can configure a specific batch read count (such as 1024 records) instead of reading one by one to reduce the number of interactions with the data source, improve I/O efficiency, and reduce network latency.
Output Fields
The output fields area displays all fields hit by the selected table and filter criteria. If you do not need to output certain fields to downstream components, you can delete the corresponding fields:
Single Field Deletion Scenario: If you need to delete a small number of fields, you can click the
icon under the operation column to delete the extra fields.Batch Field Deletion Scenario: If you need to delete many fields, you can click Field Management, select multiple fields in the Field Management dialog box, then click the
shift left icon to move the selected input fields to the unselected input fields and click OK to complete the batch field deletion. 
Click OK to complete the DataHub input component configuration.