After you prepare data sources, network environments, and resources, you can create a real-time data syn node to synchronize data to MaxCompute. This topic describes how to create a real-time sync node and view the status of the node.

Limits

  • DataWorks allows you to use only exclusive resource groups for Data Integration to run real-time data sync nodes.

  • You can run real-time data sync nodes to synchronize data only from PolarDB, Oracle, or MySQL to MaxCompute.
  • A real-time data sync node cannot be used to synchronize data in a table that has no primary key.

Create a real-time data sync node

  1. Log on to the DataWorks console.
  2. In the left-side navigation pane, click Workspaces.
  3. After you select the region in which the workspace that you want to manage resides, find the workspace and click Data Analytics in the Actions column.
  4. Create a workflow.
    If you have a workflow, skip this step.
    1. Move the pointer over the Create icon and select Workflow.
    2. In the Create Workflow dialog box, set the Workflow Name parameter.
    3. Click Create.
  5. Create a real-time data sync node.
    1. On the DataStudio page, move the pointer over the Create icon icon and choose Data Integration > Real-time synchronization.
      Alternatively, find the workflow in which you want to create a real-time data sync node and right-click the Data Integration. From the shortcut menu, choose Create > Real-time synchronization.
    2. In the Create Node dialog box, set the parameters that are described in the following table. Create a real-time data sync node to synchronize data to MaxCompute
      Parameter Description
      Node Type The type of the node. Default value: Real-time synchronization.
      Sync Method Set the value to Migration to MaxCompute. In this case, partial or all tables in your database are migrated to MaxCompute.
      Node Name The name of the node. The name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
      Location The directory in which the real-time data syn node is stored.
    3. Click Commit. You are navigated to the configuration tab of the real-time data sync node.
  6. Select a resource group.
    1. On the right side of the configuration tab, click the Basic Configuration tab.
    2. In the panel that appears, select the resource group that you want to use from the Resource Group drop-down list.
      Note

      DataWorks allows you to use only exclusive resource groups for Data Integration to run real-time data sync nodes.

      If no exclusive resource group for Data Integration exists, click Create Exclusive Resource Group for Data Integration to create a resource group. For more information, see Exclusive resource groups for Data Integration.
  7. Select a data source as the source and configure synchronization rules.
    1. In the Data Source section, specify the Type and Data source parameters.
      Note You can set the Type parameter only to MySQL, Oracle, or PolarDB.
    2. In the Source Table section, select the tables whose data you want to synchronize from the Source Table list. Then, click the Move icon icon to add the tables to the Selected Tables list.
      Source Table section
      The Source Table section displays all the tables in the source. You can select all or specific tables.
      Notice If a selected table does not have a primary key, the table cannot be synchronized in real time.
    3. In the Set Mapping Rules for Table/Database Names section, configure a rule as needed.
      Supported options are Conversion Rule for Table Name and Rule for Destination Table name.
      • Conversion Rule for Table Name: the rule that is used to convert the names of source tables to those of destination tables.
      • Rule for Destination Table name: the rule that is used to add a prefix or a suffix to the converted names of destination tables.
    4. Click Next Step.
  8. Select a data source as the destination and configure the formats for the destination tables.
    1. In the Set Destination Table step, set the Destination and Write Mode parameters.
    2. Click the Edit icon icon next to Time automatic partition setting. In the Edit dialog box, modify the partition settings for the destination tables. You can configure daily partitions.
    3. Optional:Add fields to the destination tables
      If you want to add fields to all the tables to be synchronized, click New field in the Fields In Destination Table section.
    4. Click Refresh source table and MaxCompute Table mapping to create the mappings between the source tables and destination MaxCompute tables.
    5. View the mapping progress, source tables, and mapped destination tables. View the mapping progress, source tables, and mapped destination tables
      Area No. Description
      1
      The progress of mapping the source tables to the destination tables.
      Note The mapping may require a long period of time if you want to synchronize data from a large number of tables.
      2
      • If the tables in the source database contain primary keys, the system removes duplicate data based on the primary keys during the synchronization.
      • If the tables in the source database do not contain primary keys, you can click the Edit icon to customize primary keys. You can use one field or a combination of several fields as the primary keys of the tables. This way, the system removes duplicate data based on the primary keys during the synchronization.
      Note

      A real-time data sync node cannot be used to synchronize data in a table that has no primary key.

      3
      The name of the destination table. The table name that appears varies based on the value that you selected from the drop-down list in the Table creation method column.
      • If you set the Table creation method parameter to Create Table, the name of the destination table that is automatically created appears. You can click the table name to view and modify the table creation statements.
      • If you set the Table creation method parameter to Use Existing Table, you must select a table name from the drop-down list in the MaxComputeBase Table name column.
    6. Click Next Step.
      If you set the Table creation method to Create Table, you must click Start table building in the Create tables automatically dialog box to create destination MaxCompute tables.
  9. Configure rules for processing DDL messages.
    DDL statements exist in the source. Before you synchronize data, you can configure synchronization rules for different DDL statements based on your business requirements.
    Note The rules apply when a real-time data sync node is run for the first time. If you want to modify the rules in subsequent operations, go to the Real Time DI page of Operation Center. For more information, see Manage the real-time data sync node.
    1. In the Set Processing Policy for DDL Messages step, configure rules to process DDL messages during data synchronization.Set Processing Policy for DDL Messages step
      The following table describes the processing rules for different DDL messages.
      DDL message Rule
      CreateTable DataWorks processes a DDL message of the related type based on the following rules after it receives the message:
      • Normal: sends the message to the destination. Then, the destination processes the message. Each destination may process DDL messages based on its own business logic. If you select Normal for CreateTable, DataWorks only forwards the messages.
      • Ignore: ignores the message and does not send it to the destination.
      • Alert: ignores the message and records the alert in real-time synchronization logs. In addition, the alert contains information about the reason indicating that a message is ignored because of a running error.
      • Error: returns an error when the real-time sync solution is running and terminates the real-time sync solution.
      DropTable
      AddColumn
      DropColumn
      RenameTable
      RenameColumn
      ChangeColumn
      TruncateTable
    2. Click Next Step.
  10. Configure the resources required by the data sync node.
    1. In the Set Resources for Solution Running step, set the parameters that are described in the following table.
      Parameter Description
      Maximum number of connections supported by source read The maximum number of Java Database Connectivity (JDBC) connections that are allowed for the source. Specify an appropriate number based on the resources of the source. Default value: 15.
      Maximum number of parallel threads allowed to read by destination The maximum number of parallel threads that the sync node uses to read data from the source table or write data to the destination. Maximum value: 32. Specify an appropriate number based on the resources of the source and the destination.
    2. Click Complete Configuration.

Commit and deploy the real-time data sync node

Commit and deploy the MySQL node.
  1. Click the Save icon in the top toolbar to save the node.
  2. Click the Submit icon in the top toolbar to commit the node.
  3. In the Commit Node dialog box, enter your comments in the Change description field.
  4. Click OK.
If you use a workspace in standard mode, you must deploy the node in the production environment after you commit the node. Click Deploy in the upper-right corner. For more information, see Deploy nodes.

Start the real-time data sync node

  1. Go to the Operation Center page.
    After you commit and deploy the real-time data sync node, click Operation Center in the upper-right corner of the DataStudio page to manage the node on the Real Time DI page.
  2. View the details of a real-time data sync node.
    On the Real Time DI page, find the real-time data sync node that you want to view and click the node name.
  3. Start the real-time data sync node.
    1. Go back to the previous page, find the real-time data sync node that you want to start and click Start in the Operation.
    2. In the Start dialog box, set the parameters that are described in the following table. Start dialog box
      Parameter Description
      Whether to reset the site Specifies whether to set the point in time for the next startup. If you select the Reset site parameter, the Start time point and Time zone parameters are required.
      Start time point The date and time for starting the real-time data sync node.
      Time zone The time zone in which the real-time data sync node is run. You can select a time zone from the Time zone drop-down list.
      Failover The maximum number of failovers allowed within the specified time range.
      Note If this parameter is not specified, the system automatically stops the node if the number of failovers exceeds 100 within 5 minutes. This prevents excessive resource consumption caused by the frequent starting of the node.
      Dirty data policy
      • Zero tolerance, not allowed: The real-time sync node is automatically stopped if the node contains dirty data.
      • No limit: The real-time data sync node can normally run regardless of whether the node contains dirty data.
      • Limited control: The real-time data sync node is automatically stopped if the amount of dirty data contained in the node exceeds a specified value.
      Processing Policy for DDL Messages in Real-time Sync You can modify the configured rules that are used to process DDL messages based on your business requirements. For more information, see Step 10 of this topic.

Manage the real-time data sync node

  • Stop a real-time data sync node that is running.

    Find the real-time data sync node that you want to stop and click Stop in the Operation column. In the message that appears, click Stop.

  • Undeploy a real-time data sync node that is not running.

    Find the real-time data sync node that you want to undeploy and click Undeploy in the Operation column. In the message that appears, click Undeploy.

  • View the alert information of a real-time data sync node.

    Find the real-time data sync node that you want to view and click Alert settings in the Actions column. In the Alert settings dialog box, view the alert events and alert rules.

  • Configure alert rules for a real-time data sync node.
    1. Find the real-time data sync node for which you want to configure alert rules and click Configure Alert Rule in the lower part of the Real Time DI page.
    2. In the New rule dialog box, set the parameters that are described in the following table.
      Parameter Description
      Name The name of the alert rule.
      Description The description of the alert rule.
      Indicators The metric for which an alert is reported. Valid values:
      • Status
      • Business delay
      • Failover
      • Dirty Data
      • Not Supported by DDL Statement
      Threshold The threshold for reporting an alert. Specify the WARNING In and CRITICAL In parameters. The default values of the parameters are 5 minutes.
      Alarm interval The interval at which an alert is reported. The default value is 5 minutes.
      WARNING The method that is used to send alert notifications. You can specify one or more methods. Valid values: Mail, SMS, and DingTalk.
      Note Only Singapore, Malaysia(Kuala Limpur), and Germany(Frankfurt) support the SMS reminding method. To use the SMS reminding method in other regions, submit a ticket to contact DataWorks technical support.
      CRITICAL
      Receiver (Non-DingTalk) The recipient of alert notifications.
    3. Click Confirm.
  • Modifies alert rules for real-time data sync nodes at a time.
    1. Select one or more real-time data sync nodes for which you want to modify alert rules and click Operation alarm in the lower part of the Real Time DI page.
    2. In the Operation alarm dialog box, modify the values of the Type and Indicators parameters.
    3. Click Confirm.