A Datahub reader reads data from Datahub in real time by using the Datahub SDK.

The reader keeps running after it is started and reads data from Datahub when Datahub stores new data. A Datahub reader has the following two features:
  • Reads data in real time.
  • Reads data concurrently based on the number of shards in Datahub.

Create a Datahub reader

  1. Log on to the DataWorks console. In the left-side navigation pane, click Workspaces. On the Workspaces page, find the target workspace and click Data Analytics in the Actions column.
  2. On the Data Analytics tab, move the pointer over the Create a sync node icon and choose Data Integration > Real-Time Sync.

    You can also find the target workflow, right-click Data Integration, and choose Create > Real-Time Sync.

  3. In the Create Node dialog box that appears, set Node Name and Location, and then click Commit.
  4. On the configuration tab of the real-time sync node, drag DataHub under Reader to the editing panel.
  5. Click the Datahub reader node and set parameters in the Node Settings section.
    Parameter Description
    Connection The connection to Datahub. In this example, you can only select a Datahub connection.

    If no connection is available, click Add Connection on the right to create one on the Workspace Manage > Data Source page.

    Topic The name of the topic from which data is read in Datahub. You can click Preview on the right to preview the selected topic.
    Start Offset The start time of the sync node.
    Time Zone The time zone where Datahub resides.
    Output Fields The fields from which data is read.
  6. Click Save the settings in the toolbar.