LogHub Reader reads data from LogHub topics you specified in real time by using the Loghub SDK and supports shard merge and split. After shards are merged or split, duplicate data records may exist but no data is lost.

Procedure

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. Select the region where the required workspace resides, find the workspace, and then click Data Analytics.
  2. Move the pointer over the Create icon and choose Data Integration > Real-time synchronization.
    Alternatively, you can click the required workflow, right-click Data Integration, and then choose Create > Real-time synchronization.
  3. In the Create Node dialog box, set the Node Name and Location parameters.
    Notice The node name must be 1 to 128 characters in length. It can contain letters, digits, underscores (_), and periods (.).
  4. Click Commit.
  5. On the configuration tab of the real-time sync node, drag Loghub under Input to the canvas on the right.
  6. Click the new LogHub node. In the configuration pane that appears, set the required parameters in the Node configuration section.
    LogHub
    Parameter Description
    Data source The connection to the LogHub data store. In this example, you can select only a LogHub connection.

    If no connection is available, click New data source on the right to create one on the Data Source page. For more information, see Configure a LogHub connection.

    Logstore The name of the Logstore from which data is read in LogHub. You can click Data preview on the right to preview the selected Logstore.
    Advanced Configuration Specifies whether to split data in the Logstore. If you select Split for Split tasks, you must set Split rules.
    Output field The fields from which data is read.
  7. Click the Save icon in the toolbar.