DataWorks supports real-time synchronization. This topic describes how to create, configure, commit, and manage real-time sync nodes.

Prerequisites

The real-time synchronization feature is in public preview. This feature is available in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), and China (Chengdu).

Create a real-time sync node

  1. Log on to the DataWorks console.
  2. In the left-side navigation pane, click Workspaces.
  3. In the top navigation bar, select the region where the target workspace resides. Find the target workspace and click Data Analytics in the Actions column.
  4. On the Data Development tab, move the pointer over the New icon and choose Data integration > Real-time synchronization.
    Alternatively, you can click a workflow in the Business process section, right-click Data Integration, and then choose New > Real-time synchronization. For more information about the data stores that the real-time synchronization feature supports, see Supported data stores.
  5. In the New node dialog box, set the parameters as required.
    New node
    Parameter Description
    Node type The type of the node. Default value: Real-time synchronization.
    Sync Method The method for synchronizing data. Valid values:
    • End-to-end ETL: synchronizes data in one table to one or more tables. Data transformation is supported during the synchronization process.
    • Migration to Hologres: synchronizes all or some tables in a database to Hologres. Destination tables can be automatically created in Hologres.
    • Migration to MaxCompute: synchronizes all or some tables in a database to MaxCompute.
    • Migration to Datahub: synchronizes all or some topics in a database to DataHub.
    Node name The name of the node. The node name must be 1 to 128 characters in length, and can contain letters, digits, underscores (_), and periods (.).
    Destination folder The folder where the node resides.
  6. Click Submit.

Configure the real-time sync node

The operations that you can perform on the configuration tab of the real-time sync node vary based on the synchronization method you selected.
  • To configure the real-time sync node for which Sync Method is set to End-to-end ETL, perform the following steps:
    1. Double-click the real-time sync node. On the node configuration tab that appears, click the Basic configuration tab in the right-side navigation pane. On the Basic configuration tab, select the desired resource group from the Resource Group drop-down list.Real-time synchronization
      No. Description
      1 The left-side navigation tree. This pane consists of the Input, Output, and Conversion sections.
      2 The configuration canvas of the real-time sync node. You can drag components from the navigation tree to the canvas.
      3 The property configuration pane of the real-time sync node. This pane appears after you click a node on the canvas or click the Basic configuration tab in the right-side navigation pane.
      Notice You must select a resource group before you commit the node. Otherwise, the system returns an error when you commit the node. Real-time sync nodes can be run only on an exclusive resource group for Data Integration. For more information, see Use exclusive resource groups for data integration.
    2. Drag components from the navigation tree to the canvas, and drag directed lines to connect the nodes on the canvas. Data will be synchronized from upstream nodes to downstream nodes based on the connection.
    3. Click each node. In the configuration pane that appears, set the required parameters in the Node configuration section. For more information, see Supported data stores.Node configuration
    4. Click the Save icon in the toolbar.
  • To configure the real-time sync node for which Sync Method is set to Migration to Hologres, perform the following steps:
    1. Double-click the real-time sync node. On the node configuration tab that appears, click the Basic configuration tab in the right-side navigation pane. On the Basic configuration tab, select the desired resource group from the Resource Group drop-down list.Synchronize data to Hologres
      Notice You must select a resource group before you commit the node. Otherwise, the system returns an error when you commit the node. Real-time sync nodes can be run only on an exclusive resource group for Data Integration. For more information, see Use exclusive resource groups for data integration.
    2. In the Data source section, set the Type and Data source parameters.
    3. In the Select the source table for synchronization section, select the tables to be synchronized in the SOURCE Table list and click the Left arrow icon to move the tables to the Selected Source table list.
      The SOURCE Table list displays all the tables in the source database. You can select all or some tables to synchronize them at a time.
      Notice If a selected table does not have a primary key, the table cannot be synchronized in real time.
    4. Optional:In the Set synchronization rules section, click Add rule and select an option to configure naming rules for destination tables.
      Supported options include Table name conversion rules and Target table name rule.
      • Table name conversion rules: the rules for converting the names of source tables to that of destination tables.
      • Target table name rule: the rule for adding a prefix and suffix to the converted names of destination tables.
    5. Click Next Step.
    6. In the Set target table step, set the Target Hologres data source and Schema parameters.
    7. Click Reload source table and Hologres Table mapping to configure the mappings between the source tables and destination Hologres tables.
    8. Check the source and destination tables after the mappings are created, and click Next Step.Execution progress
      No. Description
      1 The mapping progress between the source and destination tables.
      Note The mapping may take a long period of time if the number of source tables to be synchronized is large.
      2 The destination tables to which data is written. The tables can be existing ones or the ones that are automatically created.
      Note An error message appears if the selected source table does not have a primary key. The synchronization can be performed if one of the selected source tables has a primary key. Source tables without primary keys are ignored during the synchronization.
      3 The method of creating a destination table. The message that appears in the Hologres Table name column varies depending on the method that you select.
      • If you select Create tables automatically, the Create tables automatically dialog box appears after you click Next Step. Click Start table building in the dialog box, and then click Close after the table is created. You can click the table name to view and modify the table creation statements.
      • If you select Use existing Table, you must select a table from the drop-down list in the Hologres Table name column.
    9. In the Run resource settings step, set the Maximum number of connections supported by source read and Number of concurrent writes on the target side parameters and then click the Save icon in the toolbar.
  • To configure the real-time sync node for which Sync Method is set to Migration to MaxCompute, perform the following steps:
    1. Double-click the real-time sync node. On the node configuration tab that appears, click the Basic configuration tab in the right-side navigation pane. On the Basic configuration tab, select the desired resource group from the Resource Group drop-down list.
    2. In the Data source section, set the Type and Data source parameters.
    3. In the Select the source table for synchronization section, select the tables to be synchronized in the SOURCE Table list and click the Left arrow icon to move the tables to the Selected Source table list.
      The SOURCE Table list displays all the tables in the source database. You can select all or some tables to synchronize them at a time.
      Notice If a selected table does not have a primary key, the table cannot be synchronized in real time.
    4. Optional:In the Set synchronization rules section, click Add rule and select an option to configure naming rules for destination tables.
      Supported options include Table name conversion rules and Target table name rule.
      • Table name conversion rules: the rules for converting the names of source tables to that of destination tables.
      • Target table name rule: the rule for adding a prefix and suffix to the converted names of destination tables.
    5. Click Next Step.
    6. In the Set target table step, select a connection from the Target MaxCompute (ODPS) data source drop-down list and click the Edit icon next to MaxCompute (ODPS) time automatic partition settings. In the Edit dialog box, set the partition interval of tables in MaxCompute to day or hour.
    7. Click Reload source table and MaxCompute (ODPS) Table mapping to configure the mappings between the source tables and destination MaxCompute tables.
    8. Check the source and destination tables after the mappings are created, and click Next Step.MaxCompute
      No. Description
      1 The mapping progress between the source and destination tables.
      Note The mapping may take a long period of time if the number of source tables to be synchronized is large.
      2 The destination tables to which data is written. The tables can be existing ones or the ones that are automatically created.
      Note An error message appears if the selected source table does not have a primary key. The synchronization can be performed if one of the selected source tables has a primary key. Source tables without primary keys are ignored during the synchronization.
      3 The method of creating a destination table. The message that appears in the MaxCompute (ODPS) Table name column varies depending on the method that you select.
      • If you select Create tables automatically, the Create tables automatically dialog box appears after you click Next Step. Click Start table building in the dialog box, and then click Close after the table is created. You can click the table name to view and modify the table creation statements.
      • If you select Use existing Table, you must select a table from the drop-down list in the MaxCompute (ODPS) Table name column.
    9. In the Run resource settings step, set the Maximum number of connections supported by source read and Number of concurrent writes on the target side parameters and then click the Save icon in the toolbar.
  • To configure the real-time sync node for which Sync Method is set to Migration to Datahub, perform the following steps:
    1. Double-click the real-time sync node. On the node configuration tab that appears, click the Basic configuration tab in the right-side navigation pane. On the Basic configuration tab, select the desired resource group from the Resource Group drop-down list.
    2. In the Data source section, set the Type and Data source parameters.
    3. In the Select the source table for synchronization section, select the tables to be synchronized in the SOURCE Table list and click the Left arrow icon to move the tables to the Selected Source table list.
      The SOURCE Table list displays all the tables in the source database. You can select all or some tables to synchronize them at a time.
      Notice If a selected table does not have a primary key, the table cannot be synchronized in real time.
    4. In the Set synchronization rules section, click Add rule and then select an option to configure naming rules for destination tables.

      Supported options include SOURCE table name and Topic conversion rules and Target Topic rules.

    5. Click Next Step.
    6. In the Set target table step, select a connection from the Target DataHub data source drop-down list and then click Reload source table and DataHub Topic mapping to configure the mappings between the source tables and destination DataHub topics.
    7. Check the source tables and destination topics after the mappings are created, and click Next Step.Check the execution process
      No. Description
      1 The mapping progress between the source tables and destination topics.
      Note The mapping may take a long period of time if the number of source tables to be synchronized is large.
      2 The destination topics to which data is written. The topics can be existing ones or the ones that are automatically created.
      3 The method of creating a destination topic. The message that appears in the Topic column varies depending on the method that you select.
      • If you select Create tables automatically, the Create tables automatically dialog box appears after you click Next Step. Click Start table building in the dialog box, and then click Close after the topic is created.
      • If you select Use existing Topic, you must select a topic from the drop-down list in the Topic column.
    8. In the Run resource settings step, set the Maximum number of connections supported by source read and Number of concurrent writes on the target side parameters and then click the Save icon in the toolbar.

Commit the real-time sync node

  1. On the configuration tab of the real-time sync node, click the Commit icon in the toolbar.
  2. In the Submit New version dialog box, enter your comments in the Change description field.
  3. Click OK.
    In a workspace in standard mode, you must click Publish in the upper-right corner after you commit the real-time sync node. For more information, see Deploy a node.

Manage the real-time sync node

  1. After you commit or deploy the real-time sync node, click Operation & Maintenance (O & M) in the upper-right corner of the node configuration tab to manage the node on the Real Time DI page.
    Real-time synchronization
  2. On the Real Time DI page, find the real-time sync node, click the node name, and then view the O&M details about the node.
    Manage the real-time sync node
    On this page, you can start, stop, undeploy, or configure alert settings for the real-time sync node.
    • To start a node that is not running, perform the following steps:
      1. Find the node and click Start in the Operation column.
      2. In the Start dialog box, set the parameters as required.Start
        Parameter Description
        Whether to reset the site Specifies whether to set the time point for next startup. If you select Reset site, the Start time point and Time zone parameters are required.
        Start time point The date and time for starting the real-time sync node.
        Time zone The time zone where the source data store resides. Select a time zone from the Time zone drop-down list.
        Task automatically ends
        • The condition for automatically terminating the real-time sync node. You can specify the maximum number of dirty data records allowed. If you set the value to 0, no dirty data records are allowed. If the value is empty, the node continues no matter whether dirty data records exist.
        • You can also specify the maximum number of failover times. If the value is empty, the node is automatically terminated if the node fails for 100 times every 5 minutes. This avoids resource occupation caused by frequent startup.
      3. Click OK.
    • To stop a running node, perform the following steps:
      1. Find the node and click Stop in the Operation column.
      2. In the message that appears, click Stop.
    • To bring offline a node that is not running, perform the following steps:
      1. Find the node and click Offline in the Operation column.
      2. In the message that appears, click Offline.
    • Find the node and click Alarm settings in the Operation column. On the page that appears, you can view alert event information and alert rules on the Alert event and Alarm rules tabs.
    • To configure alert settings for a node, perform the following steps:
      1. Select the node and click New Alarm in the lower part of the page.
      2. In the New rule dialog box, set the parameters as required.New rule
        Parameter Description
        Name Required. The name of the rule to create.
        Description The description of the rule.
        Indicators The indicators in the rule to create. Valid values: Task Status, Business latency, Failover, Dirty Data, and DDL error.
        Threshold The threshold for reporting an alert. The default value is 5 minutes for both WARNING and CRITICAL alerts.
        Alarm interval The interval for reporting alerts. The default value is 5 minutes.
        WARNING The methods for sending alerts. Valid values: Mail, SMS, Telephone, and DingTalk.
        CRITICAL
        Receiver The person who receives alerts. Select a receiver from the Receiver drop-down list.
      3. Click OK.
    • To modify alert settings for a node, perform the following steps:
      1. Select the node and click Operation alarm in the lower part of the page.
      2. In the Operation alarm dialog box, set the Operation type and Alarm indicators parameters.

        DataWorks automatically modifies all the rules for the selected alert types at a time.

      3. Click OK.