Alibaba Cloud DataHub is a streaming data processing platform. You can publish and subscribe to streaming data in DataHub and distribute the data to other platforms. This allows you to analyze streaming data and build applications based on the streaming data.

DataHub Reader reads data from DataHub by using the SDK for Java of the following version:
<dependency>
    <groupId>com.aliyun.datahub</groupId>
    <artifactId>aliyun-sdk-datahub</artifactId>
    <version>2.9.1</version>
</dependency>

Parameters

Parameter Description Required
endpoint The endpoint of DataHub. Yes
accessId The AccessKey ID that you can use to connect to DataHub. Yes
accessKey The AccessKey secret that you can use to connect to DataHub. Yes
project The name of the project in DataHub. A project is the resource management unit in DataHub for resource isolation and control. Yes
topic The name of the topic in DataHub. Yes
batchSize The number of data records to be read at a time. Default value: 1024. No
beginDateTime The start time of data consumption. This parameter defines the left boundary of a left-closed, right-open interval in the format of yyyyMMddHHmmss. The parameter can work with the scheduling time parameter in DataWorks.
Note Specify the beginDateTime and endDateTime parameters at the same time.
Yes
endDateTime The start time of data consumption. This parameter defines the right boundary of a left-closed, right-open interval in the format of yyyyMMddHHmmss. The parameter can work with the scheduling time parameter in DataWorks.
Note Specify the beginDateTime and endDateTime parameters at the same time.
Yes

Codeless UI mode

The codeless user interface (UI) mode is not supported.

Code editor mode

The following example shows how to configure a sync node to read data from DataHub. For more information, see Create a sync node by using the code editor.
{
    "job": {
         "content": [
            {
                "reader": {
                    "name": "datahubreader",
                    "parameter": {
                        "endpoint": "xxx" // The endpoint of DataHub.
                        "accessId": "xxx", // The AccessKey ID that you can use to connect to DataHub.
                        "accessKey": "xxx", // The AccessKey secret that you can use to connect to DataHub.
                        "project": "xxx", // The name of the project in DataHub.
                        "topic": "xxx" // The name of the topic in DataHub.
                        "batchSize": 1000, // The number of data records to be read at a time.
                        "beginDateTime": "20180910111214", // The start time of data consumption.
                        "endDateTime": "20180910111614", // The end time of data consumption.
                        "column": [
                            "col0",
                            "col1",
                            "col2",
                            "col3",
                            "col4"
                        ]
                    }
                },
                "writer": {
                    "name": "streamwriter",
                    "parameter": {
                        "print": false
                    }
                }
            }
        ]
    }
}