This topic describes the parameters that are supported by DataHub Writer and how to configure DataHub Writer by using the codeless user interface (UI) and code editor.

DataHub is a real-time data distribution platform that is designed to process streaming data. You can publish and subscribe to streaming data in DataHub and distribute the data to other platforms. This allows you to analyze streaming data and build applications based on the streaming data.

DataHub is built on top of the Apsara distributed operating system, and features high availability, low latency, high scalability, and high throughput. DataHub is seamlessly integrated with Realtime Compute for Apache Flink, and allows you to use SQL statements to analyze streaming data. DataHub can also distribute streaming data to Alibaba Cloud services, such as MaxCompute and Object Storage Service (OSS).
Notice Strings must be encoded in the UTF-8 format. The size of each string must not exceed 1 MB.

Channel types

The source is connected to the sink by using a single channel. Therefore, the channel type configured for the writer must be the same as that configured for the reader. In normal cases, channels are categorized into two types: memory and file. In the following configuration, the channel type is set to file:
"agent.sinks.dataXSinkWrapper.channel": "file"

Parameters

Parameter Description Required Default value
accessId The AccessKey ID of the account that you use to connect to DataHub. Yes No default value
accessKey The AccessKey secret of the account that you use to connect to DataHub. Yes No default value
endPoint The endpoint of DataHub. Yes No default value
maxRetryCount The maximum number of retries if the synchronization node fails. No No default value
mode The mode for writing strings. Yes No default value
parseContent The data to be parsed. Yes No default value
project The basic organizational unit of data in DataHub. Each project has one or more topics.
Note DataHub projects are independent of MaxCompute projects. You cannot use MaxCompute projects as DataHub projects.
Yes No default value
topic The minimum unit for data subscription and publishing. You can use topics to distinguish different types of streaming data. Yes No default value
maxCommitSize The maximum amount of the buffered data that Data Integration can accumulate before it commits the data to the destination. You can specify this parameter to improve writing efficiency. The default value is 1048576, in bytes, which is 1 MB. DataHub allows for a maximum of 10,000 data records to be written in a single write request. If the number of data records exceeds 10,000, the synchronization node fails. You can control the number of data records to be written in a single write request based on the total amount of data that is calculated by using the following formula: Average amount of data in a single data record × 10,000. No 1MB

Configure DataHub Writer by using the codeless UI

  1. Configure data sources.
    Configure Source and Target for the synchronization node. Connections section
    Parameter Description
    Connection The name of the data source to which you want to write data.
    Topic This parameter is equivalent to the topic parameter that is described in the preceding section.
    maxCommitSize The maximum amount of the data that is written to DataHub in a single request. Unit: bytes.
    maxRetryCount This parameter is equivalent to the maxRetryCount parameter that is described in the preceding section.
  2. Configure field mappings. This operation is equivalent to setting the column parameter that is described in the preceding section. Fields in the source on the left have a one-to-one mapping with fields in the destination on the right. Field mappings
    Operation Description
    Map Fields with the Same Name Click Map Fields with the Same Name to establish mappings between fields with the same name. The data types of the fields must match.
    Map Fields in the Same Line Click Map Fields in the Same Line to establish mappings between fields in the same row. The data types of the fields must match.
    Delete All Mappings Click Delete All Mappings to remove the mappings that are established.
    Auto Layout Click Auto Layout. Then, the system automatically sorts the fields based on specific rules.

Configure DataHub Writer by using the code editor

In the following code, a synchronization node is configured to write data from memory to DataHub by using the code editor. For more information, see Create a synchronization node by using the code editor.
{
    "type": "job",
    "version": "2.0",// The version number. 
    "steps": [
        { 
            "stepType": "stream",
            "parameter": {},
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "datahub",// The writer type. 
            "parameter": {
                "datasource": "",// The name of the data source to which you want to write data. 
                "topic": "",// The minimum unit for data subscription and publishing. You can use topics to distinguish different types of streaming data. 
                "maxRetryCount": 500,// The maximum number of retries if the synchronization node fails. 
                "maxCommitSize": 1048576// The maximum amount of the buffered data that Data Integration can accumulate before it commits the data to the destination. 
                 // DataHub allows for a maximum of 10,000 data records to be written in a single write request. If the number of data records exceeds 10,000, the synchronization node fails. You can control the number of data records to be written in a single write request based on the total amount of data that is calculated by using the following formula: Average amount of data in a single data record × 10,000. For example, if the data size of a single data record is 10 KB, the value of this parameter must be less than the result of 10 multiplied by 10,000. 
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {
            "record": ""// The maximum number of dirty data records allowed. 
        },
        "speed": {
            "throttle":true,// Specifies whether to enable bandwidth throttling. The value false indicates that bandwidth throttling is disabled, and the value true indicates that bandwidth throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
            "concurrent":20, // The maximum number of parallel threads. 
            "mbps":"12"// The maximum transmission rate.
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}