All Products
Search
Document Center

DataWorks:Configure LogHub Reader

Last Updated:Dec 29, 2023

LogHub Reader reads data from LogHub topics in real time by using the LogHub SDK and supports shard merging and splitting. After shards are merged or split, duplicate data records may exist but no data is lost.

Background information

The following table describes the metadata fields that LogHub Reader for real-time synchronization provides.

Field provided by LogHub Reader for real-time synchronization

Data type

Description

__time__

STRING

A reserved field of Simple Log Service. The field specifies the time when logs are written to Simple Log Service. The field value is a UNIX timestamp in seconds.

__source__

STRING

A reserved field of Simple Log Service. The field specifies the source device from which logs are collected.

__topic__

STRING

A reserved field of Simple Log Service. The field specifies the name of the topic for logs.

__tag__:__receive_time__

STRING

The time when logs arrive at the server. If you enable the public IP address recording feature, this field is added to each raw log when the server receives the logs. The field value is a UNIX timestamp in seconds.

__tag__:__client_ip__

STRING

The public IP address of the source device. If you enable the public IP address recording feature, this field is added to each raw log when the server receives the logs.

__tag__:__path__

STRING

The path of the log file collected by Logtail. Logtail automatically adds this field to logs.

__tag__:__hostname__

STRING

The hostname of the device from which Logtail collects data. Logtail automatically adds this field to logs.

Procedure

  1. Go to the DataStudio page.

    1. Log on to the DataWorks console.

    2. In the left-side navigation pane, click Workspaces.

    3. In the top navigation bar, select the region in which the workspace that you want to manage resides. On the Workspaces page, find the workspace and click Shortcuts > Data Development in the Actions column.

  2. In the Scheduled Workflow pane, move the pointer over the Create a table icon and choose Create Node > Data Integration > Real-time synchronization.

    Alternatively, right-click the required workflow, and then choose Create Node > Data Integration > Real-time synchronizationReal-time synchronization.

  3. In the Create Node dialog box, set the Sync Method parameter to End-to-end ETL and configure the Name and Path parameters.

    Important

    The node name cannot exceed 128 characters in length and can contain letters, digits, underscores (_), and periods (.).

  4. Click Confirm.

  5. On the configuration tab of the real-time synchronization node, drag Loghub > Input section to the canvas on the right.

  6. Click the LogHub node. In the panel that appears, configure the parameters.

    LogHub

    Parameter

    Description

    Data source

    The LogHub data source that you have configured. You can select only a LogHub data source.

    If no data source is available, click New data source on the right to add one on the Data Source page. For more information, see Add a LogHub (SLS) data source.

    Logstore

    The name of the Logstore from which you want to read data. You can click Preview Data to preview data in the selected topic.

    Advanced configuration

    Specifies whether to split data in the Logstore. If you select Split for Split tasks, you must specify Split rules.

    You can specify a sharding rule in the format of shardId % X = Y. The equation is used to obtain the remainder of shardId divided by X. shardId indicates the ID of a sharding task, X indicates the total number of shards, and Y indicates the ID of a shard on which the sharding task takes effect. The value is [0, X-1]. For example, shardId % 5 = 3 indicates that the source data that you want to synchronize is divided into five shards, and a sharding task is assigned to take effect on the shard whose ID is 3.

    Output field

    The fields from which you want to synchronize data. For information about the field descriptions, see Background information.

  7. Click the 保存 icon in the top toolbar.