After you prepare data sources, network environments, and resources, you can create a real-time sync node to synchronize data to Hologres. This topic describes how to create a real-time sync node and view the status of the node.

Prerequisites

Before you create a real-time sync node, read the following topics to make sure that the required operations are performed:

Limits

  • DataWorks allows you to use only exclusive resource groups for Data Integration to run real-time data sync nodes.

  • You can run real-time sync nodes to synchronize data only from PolarDB, Oracle, MySQL, or SQL Server data sources to Hologres.
  • A real-time data sync node cannot be used to synchronize data in a table that has no primary key.

Create a real-time sync node

  1. Log on to the DataWorks console.
  2. In the left-side navigation pane, click Workspaces.
  3. After you select the region in which the workspace that you want to manage resides, find the workspace and click Data Analytics in the Actions column.
  4. Create a workflow.
    If you have a workflow, skip this step.
    1. Move the pointer over the Create icon and select Workflow.
    2. In the Create Workflow dialog box, set the Workflow Name parameter.
    3. Click Create.
  5. Create a real-time sync node.
    1. On the DataStudio page, move the pointer over the Create icon icon and choose Data Integration > Real-time synchronization.
      Alternatively, find the workflow in which you want to create a real-time data sync node and right-click the Data Integration. From the shortcut menu, choose Create > Real-time synchronization.
    2. In the Create Node dialog box, set the parameters that are described in the following table. Create a real-time sync node to synchronize data to Hologres
      Parameter Description
      Node Type The type of the node. Default value: Real-time synchronization.
      Sync Method Set the parameter to Migration to Hologres. In this case, partial or all tables in your desired database are migrated to Hologres.
      Node Name The name of the node. The name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
      Location The directory in which the real-time sync node is stored.
    3. Click Commit. You are navigated to the configuration tab of the real-time data sync node.
  6. Select a resource group.
    1. On the right side of the configuration tab, click the Basic Configuration tab.
    2. In the panel that appears, select the resource group that you want to use from the Resource Group drop-down list.
      Note

      DataWorks allows you to use only exclusive resource groups for Data Integration to run real-time data sync nodes.

      If no exclusive resource group for Data Integration exists, click Create Exclusive Resource Group for Data Integration to create a resource group. For more information, see Exclusive resource groups for Data Integration.
  7. Select a source and configure synchronization rules.
    1. In the Data Source section, set the Type and Data source parameters.
      Note You can set the Type parameter only to MySQL, SQL Server, Oracle, or PolarDB.
    2. In the Source Table section, select the tables whose data you want to synchronize from the Source Table list. Then, click the Move icon icon to add the tables to the Selected Tables list.
      Source Table section
      The Source Table section displays all the tables in the source. You can select all or specific tables.
      Notice If a selected table does not have a primary key, the table cannot be synchronized in real time.
    3. In the Set Mapping Rules for Table/Database Names section, configure a rule as needed.
      Supported options are Conversion Rule for Table Name and Rule for Destination Table name.
      • Conversion Rule for Table Name: the rule that is used to convert the names of source tables to those of destination tables.
      • Rule for Destination Table name: the rule that is used to add a prefix or a suffix to the converted names of destination tables.
    4. Click Next Step.
  8. Select a data source as the destination and configure the formats for the destination tables.
    1. In the Basic Configurations of Destination Table section of the Set Destination Table step, specify the Destination and Write Hologres policy parameters.
    2. Click Refresh source table and Hologres Table mapping to configure the mappings between the source tables and destination Hologres tables.
    3. View the mapping progress, source tables, and mapped destination tables. View the mapping progress, source tables, and mapped destination tables
      No. Description
      1
      The progress of mapping the source tables to the destination tables.
      Note The mapping may require a long period of time if you want to synchronize data from a large number of tables.
      2
      • If the tables in the source database contain primary keys, the system removes duplicate data based on the primary keys during the synchronization.
      • If the tables in the source database do not contain primary keys, you can click the Edit icon to customize primary keys. You can use one field or a combination of several fields as the primary keys of the tables. This way, the system removes duplicate data based on the primary keys during the synchronization.
      3 The method that is used to create a table. Valid values: Create Table and Use existing Table.
    4. Click Next Step.
      If you set the Table creation method to Create Table, you must click Start table building in the Create tables automatically dialog box to create destination Hologres tables.
  9. Configure rules for processing DDL messages.
    DDL statements exist in the source. Before you synchronize data, you can configure synchronization rules for different DDL statements based on your business requirements.
    Note The rules apply when a real-time sync node is run for the first time. If you want to modify the rules in subsequent operations, go to the Real Time DI page. For more information, see the "Start the real-time sync node" section of this topic.
    1. In the Set Processing Policy for DDL Messages step, configure rules to process DDL messages during data synchronization.Set Processing Policy for DDL Messages step
      The following table describes the processing rules for different DDL messages.
      DDL message Rule
      CreateTable DataWorks processes a DDL message of the related type based on the following rules after it receives the message:
      • Normal: sends the message to the destination. Then, the destination processes the message. Each destination may process DDL messages based on its own business logic. If you select Normal for CreateTable, DataWorks only forwards the messages.
      • Ignore: ignores the message and does not send it to the destination.
      • Alert: ignores the message and records the alert in real-time synchronization logs. In addition, the alert contains information about the reason indicating that a message is ignored because of a running error.
      • Error: returns an error when the real-time sync solution is running and terminates the real-time sync solution.
      DropTable
      AddColumn
      DropColumn
      RenameTable
      RenameColumn
      ChangeColumn
      TruncateTable
    2. Click Next Step.
  10. Configure the resources required by the data sync node.
    1. In the Set Resources for Solution Running step, set the parameters that are described in the following table.
      Parameter Description
      Maximum number of connections supported by source read The maximum number of Java Database Connectivity (JDBC) connections that are allowed for the source. Specify an appropriate number based on the resources of the source. Default value: 15.
      Maximum number of parallel threads allowed to read by destination The maximum number of parallel threads that the sync node uses to read data from the source table or write data to the destination. Maximum value: 32. Specify an appropriate number based on the resources of the source and the destination.
    2. Click Complete Configuration.

Commit and deploy the real-time data sync node

Commit and deploy the MySQL node.
  1. Click the Save icon in the top toolbar to save the node.
  2. Click the Submit icon in the top toolbar to commit the node.
  3. In the Commit Node dialog box, enter your comments in the Change description field.
  4. Click OK.
If you use a workspace in standard mode, you must deploy the node in the production environment after you commit the node. Click Deploy in the upper-right corner. For more information, see Deploy nodes.

Start the real-time sync node

  1. Go to the Operation Center page.
    After you commit and deploy the real-time data sync node, click Operation Center in the upper-right corner of the DataStudio page to manage the node on the Real Time DI page.
  2. View the details of a real-time data sync node.
    On the Real Time DI page, find the real-time data sync node that you want to view and click the node name.
  3. Start the real-time sync node.
    1. Go back to the previous page and click Start in the Operation column that corresponds to your desired node.
    2. In the Start dialog box, set the parameters as required. Start the real-time sync node
      Parameter Description
      Whether to reset the site Specifies whether to set the point in time for the next startup. If you select Reset site, the Start time point and Time zone parameters are required.
      Start time point The date and time for starting the real-time sync node.
      Time zone The time zone in which the real-time sync node is run. You can select a time zone from the Time zone drop-down list.
      Failover The maximum number of failovers allowed within the specified time range.
      Note If this parameter is not specified, the system automatically stops the node if the number of failovers exceeds 100 within 5 minutes. This avoids excessive resource consumption caused by the frequent starting of the node.
      Dirty data policy
      • Zero tolerance, not allowed: The real-time sync node is automatically stopped if the node contains dirty data.
      • No limit: The real-time sync node can normally run regardless of whether the node contains dirty data.
      • Limited control: The real-time sync node is automatically stopped if the amount of dirty data contained in the node exceeds a specified value.
      Processing Policy for DDL Messages in Real-time Sync You can modify the configured rules that are used to process DDL messages based on your business requirements. For more information, see Step 10 in this topic.

Manage the real-time data sync node

  • Stop a real-time data sync node that is running.

    Find the real-time data sync node that you want to stop and click Stop in the Operation column. In the message that appears, click Stop.

  • Undeploy a real-time data sync node that is not running.

    Find the real-time data sync node that you want to undeploy and click Undeploy in the Operation column. In the message that appears, click Undeploy.

  • View the alert information of a real-time data sync node.

    Find the real-time data sync node that you want to view and click Alert settings in the Actions column. In the Alert settings dialog box, view the alert events and alert rules.

  • Configure alert rules for a real-time data sync node.
    1. Find the real-time data sync node for which you want to configure alert rules and click Configure Alert Rule in the lower part of the Real Time DI page.
    2. In the New rule dialog box, set the parameters that are described in the following table.
      Parameter Description
      Name The name of the alert rule.
      Description The description of the alert rule.
      Indicators The metric for which an alert is reported. Valid values:
      • Status
      • Business delay
      • Failover
      • Dirty Data
      • Not Supported by DDL Statement
      Threshold The threshold for reporting an alert. Specify the WARNING In and CRITICAL In parameters. The default values of the parameters are 5 minutes.
      Alarm interval The interval at which an alert is reported. The default value is 5 minutes.
      WARNING The method that is used to send alert notifications. You can specify one or more methods. Valid values: Mail, SMS, and DingTalk.
      Note Only Singapore, Malaysia(Kuala Limpur), and Germany(Frankfurt) support the SMS reminding method. To use the SMS reminding method in other regions, submit a ticket to contact DataWorks technical support.
      CRITICAL
      Receiver (Non-DingTalk) The recipient of alert notifications.
    3. Click Confirm.
  • Modifies alert rules for real-time data sync nodes at a time.
    1. Select one or more real-time data sync nodes for which you want to modify alert rules and click Operation alarm in the lower part of the Real Time DI page.
    2. In the Operation alarm dialog box, modify the values of the Type and Indicators parameters.
    3. Click Confirm.