After you configure network environments and resources and add data sources to DataWorks, you can create and run a real-time data sync node to synchronize data between the data sources. This topic describes how to create a real-time data sync node and view the status of the node.

Prerequisites

Before you create a real-time data sync node, make sure that the following operations are performed:

Limits

  • DataWorks allows you to use only exclusive resource groups for Data Integration to run real-time data sync nodes.

  • Real-time data sync nodes support the following data sources and data conversion methods:
    • Sources: MySQL Binlog, DataHub, LogHub, Kafka, PolarDB, and SQL Server
    • Destinations: MaxCompute, Hologres, Elasticsearch, DataHub, and Kafka
    • Data conversion methods: data filtering, string replacement, and data de-identification
  • The following rules must be observed when you configure a real-time data sync node:
    • You can synchronize data from one or more source tables to a single destination table. If you want to synchronize data to multiple destination tables, you must create a real-time data sync node for each destination table.
    • You can synchronize data from multiple tables to a single destination table only when the source is MySQL Binlog or SQL Server. The types and schemas of the source tables must be the same. For example, all the source tables are MySQL Binlog tables.

Create a real-time data sync node

  1. Log on to the DataWorks console.
  2. In the left-side navigation pane, click Workspaces.
  3. After you select the region in which the workspace that you want to manage resides, find the workspace and click Data Analytics in the Actions column.
  4. Create a workflow.
    If you have a workflow, skip this step.
    1. Move the pointer over the Create icon and select Workflow.
    2. In the Create Workflow dialog box, set the Workflow Name parameter.
    3. Click Create.
  5. Create a real-time data sync node.
    1. On the DataStudio page, move the pointer over the Create icon icon and choose Data Integration > Real-time synchronization.
      Alternatively, find the workflow in which you want to create a real-time data sync node and right-click the Data Integration. From the shortcut menu, choose Create > Real-time synchronization.
    2. In the Create Node dialog box, set the parameters. Create a real-time data sync node to synchronize data in a single table
      Parameter Description
      Node Type The type of the node. Default value: Real-time synchronization.
      Sync Method Set this parameter to End-to-end ETL. This method is used to synchronize data from one or more source tables to a destination table in real time. You can convert data types during synchronization.
      Note
      • You can synchronize data from one or more source tables to a single destination table. If you want to synchronize data to multiple destination tables, you must create a real-time data sync node for each destination table.
      • You can synchronize data from multiple tables to a single destination table only when the source is MySQL Binlog or SQL Server. The types and schemas of the source tables must be the same. For example, all the source tables are MySQL Binlog tables.
      Node Name The name of the node. The name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
      Location The folder in which the real-time data sync node is stored.
    3. Click Commit. You are navigated to the configuration tab of the real-time data sync node.
  6. Select a resource group.
    1. On the right side of the configuration tab, click the Basic Configuration tab.
    2. In the panel that appears, select the resource group that you want to use from the Resource Group drop-down list.
      Note

      DataWorks allows you to use only exclusive resource groups for Data Integration to run real-time data sync nodes.

      If no exclusive resource group for Data Integration exists, click Create Exclusive Resource Group for Data Integration to create a resource group. For more information, see Exclusive resource groups for Data Integration.
  7. Configure the source data source.
    1. From the left-side Input section of the configuration tab of the real-time data sync node, drag the required source component to the configuration canvas on the right.
    2. Click the component and set the parameters in the panel that appears.
      The following topics describe the data sources that can be used as the sources for real-time single-table synchronization and the configurations required for these data sources:
  8. Optional:Configure a data conversion method.
    If you want to convert data types during synchronization, you can configure a data conversion method.
    1. From the left-side Conversion section of the configuration tab of the real-time data sync node, drag the required data conversion method component to the configuration canvas on the right.
    2. Click the component and set the parameters in the panel that appears.
      The following topics describe the supported data conversion methods and the configurations required for these methods:
  9. Configure the destination data source.
    1. From the left-side Output section of the configuration tab of the real-time synchronization node, drag the required destination component to the configuration canvas on the right.
    2. Click the component and set the parameters in the panel that appears.
      The following topics describe the data sources that can be used as the destinations for real-time single-table synchronization and the configurations required for these data sources:
  10. Connect the source component to the destination component.
    After the source and destination components are added, you can connect the components by drawing lines. This way, the components can synchronize data based on the connection.
    • Example 1: The following figure shows the process during which data is synchronized from MySQL Binlog to MaxCompute in real time. During the synchronization, data types are not converted.
      • Source: MySQL Binlog
      • Destination: MaxCompute
      • Direction of data synchronization: The source component is connected to the destination component. Data is synchronized from MySQL Binlog to MaxCompute. Real-time synchronization
    • Example 2: The following figure shows the process during which data is synchronized from MySQL Binlog to MaxCompute in real time. During the synchronization, data types are converted.
      • Source: MySQL Binlog
      • Data conversion method: data filtering. The data in the source is filtered by using this method.
      • Destination: MaxCompute
      • Direction of data synchronization: The source component is connected to the data filtering component, and the data filtering component is connected to the destination component. Data that is read from MySQL Binlog is filtered. Then, the filtered data is synchronized to MaxCompute. Real-time synchronization with data types converted
    • Example 3: The following figure shows the process during which data is synchronized from MySQL to MaxCompute in real time. During the synchronization, data is de-identified.
      • Source: MySQL
      • Data conversion method: data de-identification. The data in the source is de-identified by using this method.
      • Destination: MaxCompute
      • Direction of data synchronization: The source component is connected to the data masking component, and the data de-identification component is connected to the destination component. Data that is read from MySQL is de-identified. Then, the de-identified data is synchronized to MaxCompute. Data de-identification

Commit and deploy the real-time data sync node

Commit and deploy the MySQL node.
  1. Click the Save icon in the top toolbar to save the node.
  2. Click the Submit icon in the top toolbar to commit the node.
  3. In the Commit Node dialog box, enter your comments in the Change description field.
  4. Click OK.
If you use a workspace in standard mode, you must deploy the node in the production environment after you commit the node. Click Deploy in the upper-right corner. For more information, see Deploy nodes.

Start the real-time data sync node

  1. Go to the Operation Center page.
    After you commit and deploy the real-time data sync node, click Operation Center in the upper-right corner of the DataStudio page to manage the node on the Real Time DI page.
  2. View the details of a real-time data sync node.
    On the Real Time DI page, find the real-time data sync node that you want to view and click the node name.
  3. Start the real-time data sync node.
    1. Go back to the previous page and click Start in the Operation column of the node that you want to start.
    2. In the Start dialog box, set the parameters as required. Start the real-time data sync node
      Parameter Description
      Whether to reset the site Specifies whether to set the point in time for the next startup. If you select the Reset site parameter, the Start time point and Time zone parameters are required.
      Start time point The date and time for starting the real-time data sync node.
      Time zone The time zone in which the real-time data sync node is run. You can select a time zone from the Time zone drop-down list.
      Failover The maximum number of failovers allowed within the specified time range.
      Note If this parameter is not specified, the system automatically stops the node if the number of failovers exceeds 100 within 5 minutes. This prevents excessive resource consumption caused by the frequent starts of the node.
      Dirty data policy
      • Zero tolerance, not allowed: The real-time data sync node is automatically stopped if the node contains dirty data.
      • No limit: The real-time data sync node can normally run regardless of whether the node contains dirty data.
      • Limited control: The real-time data sync node is automatically stopped if the amount of dirty data contained in the node exceeds a specified value.
    3. Click Confirm.

Manage the real-time data sync node

  • Stop a real-time data sync node that is running.

    Find the real-time data sync node that you want to stop and click Stop in the Operation column. In the message that appears, click Stop.

  • Undeploy a real-time data sync node that is not running.

    Find the real-time data sync node that you want to undeploy and click Undeploy in the Operation column. In the message that appears, click Undeploy.

  • View the alert information of a real-time data sync node.

    Find the real-time data sync node that you want to view and click Alert settings in the Actions column. In the Alert settings dialog box, view the alert events and alert rules.

  • Configure alert rules for a real-time data sync node.
    1. Find the real-time data sync node for which you want to configure alert rules and click Configure Alert Rule in the lower part of the Real Time DI page.
    2. In the New rule dialog box, set the parameters that are described in the following table.
      Parameter Description
      Name The name of the alert rule.
      Description The description of the alert rule.
      Indicators The metric for which an alert is reported. Valid values:
      • Status
      • Business delay
      • Failover
      • Dirty Data
      • Not Supported by DDL Statement
      Threshold The threshold for reporting an alert. Specify the WARNING In and CRITICAL In parameters. The default values of the parameters are 5 minutes.
      Alarm interval The interval at which an alert is reported. The default value is 5 minutes.
      WARNING The method that is used to send alert notifications. You can specify one or more methods. Valid values: Mail, SMS, and DingTalk.
      Note Only Singapore, Malaysia(Kuala Limpur), and Germany(Frankfurt) support the SMS reminding method. To use the SMS reminding method in other regions, submit a ticket to contact DataWorks technical support.
      CRITICAL
      Receiver (Non-DingTalk) The recipient of alert notifications.
    3. Click Confirm.
  • Modifies alert rules for real-time data sync nodes at a time.
    1. Select one or more real-time data sync nodes for which you want to modify alert rules and click Operation alarm in the lower part of the Real Time DI page.
    2. In the Operation alarm dialog box, modify the values of the Type and Indicators parameters.
    3. Click Confirm.