After you run a sync solution to synchronize data to Kafka, you can add or remove source tables to or from the sync solution with a few clicks. This topic describes how to add or remove source tables to or from a sync solution that is running.

Prerequisites

A sync solution used to synchronize data to Kafka is created and running. For more information, see Configure and manage a data synchronization solution.

Add source tables to a sync solution

  1. Go to the Data Integration page and choose Sync Solutions > Tasks to go to the Tasks page.
    For more information, see Select a synchronization solution.
  2. On the Tasks tab, find the sync solution and choose More > Modify Configuration to go to the solution configuration page.
  3. Add source tables to the sync solution and update the mappings between the source tables and destination tables.
    1. In the Source Table section of the Set Synchronization Sources and Rules step, select the source tables that you want to add to the sync solution from the Source Table list and click the More icon icon to move the tables to the Selected Tables list. Add source tables to Selected Source Table
    2. Click Next Step.
    3. In the Set Destination Topic step, click Refresh source table and Kafka Topic mapping to configure the mappings between the source tables and destination Kafka topics.
    4. View the mapping progress, source tables, and mapped destination topics. View the mapping progress, source tables, and mapped destination topics
      Serial number Description
      1 The progress of mapping the source tables to destination tables.
      Note The mapping may require an extended period of time if you want to synchronize data from a large number of tables.
      2
      • If you select Source tables without primary keys can be synchronized., a source table that does not contain a primary key can be synchronized to the destination. However, duplicate data may exist if you perform data synchronization.
      • If you select Send heartbeat record, the real-time sync node writes a record that contains the current timestamp to Kafka every 5 seconds. This way, you can view the updates of the timestamp for the latest record written to Kafka and check the progress of the data synchronization even if no new records are written to Kafka.
      3
      • If the tables in the source database contain primary keys, the system removes duplicate data based on the primary keys during the synchronization.
      • If you select Source tables without primary keys can be synchronized. and the source table does not contain a primary key, click the Edit icon icon to specify a primary key. You can select one or more columns to serve as the primary key. The values of the one or more columns are used to remove duplicate data when you perform data synchronization.
      4 The method that is used to create a destination topic. Valid values: Use Existing Topic and Create Topic.
      5

      The value in the Kafka Topic column varies with the value that you set for Topic creation method.

      • If you set the Topic creation method parameter to Use Existing Topic, you can select the destination topic from the drop-down list in the Kafka Topic column.
      • If you set the Topic creation method parameter to Create Topic, the name of the topic that is automatically created appears in the Kafka Topic column. You can click the automatically created topic to view and modify the name and description of the topic.
      6 You can click Batch Edit Additional Fields in Destination Topic and add fields for multiple Kafka topics in the dialog box that appears. You can also click Edit additional fields in the Actions column to add additional fields for a single Kafka topic.
      Note The Batch Edit Additional Fields in Destination Topic feature takes effect only If you select Create Topic for the Topic creation method parameter.
  4. Click Next Step.
  5. Configure the resources required by the sync solution.
    In the Set Resources for Solution Running step, set the parameters as required. Set Resources for Solution Running
    • Offline Sync
      Parameter Description
      Offline task name rules The name of the batch sync node that is used to synchronize the full data of the source. After a sync solution is created, DataWorks first generates a batch sync node to synchronize full data, and then generates real-time sync nodes to synchronize incremental data.
      Resource Groups for Full Batch Sync Nodes

      The exclusive resource group for Data Integration that is used to run the batch sync node.

      Only exclusive resource groups for Data Integration can be used to run solutions. You can set this parameter to the name of the exclusive resource group for Data Integration that you purchased. For more information, see Plan and configure resources.
      Note If you do not have an exclusive resource group, click Create a new exclusive Resource Group to create one.
    • Scheduling Settings
      Parameter Description
      Select scheduling Resource Group

      The resource group for scheduling that is used to run the nodes.

      Only exclusive resource groups for Data Integration can be used to run sync solutions. You can set this parameter to the name of the exclusive resource group for Data Integration that you purchased. For more information, see Plan and configure resources.
      Note If you do not have an exclusive resource group, click Create a new exclusive Resource Group to create one.
    • Incremental Sync
      Parameter Description
      Resource Groups for Incremental Batch Sync Nodes

      The exclusive resource group that is used to run the real-time sync nodes.

      Only exclusive resource groups for Data Integration can be used to run solutions. You can set this parameter to the name of the exclusive resource group for Data Integration that you purchased. For more information, see Plan and configure resources.
      Note If you do not have an exclusive resource group, click Create a new exclusive Resource Group to create one.
    • Channel Settings
      Parameter Description
      Maximum number of connections supported by source read The maximum number of Java Database Connectivity (JDBC) connections that are allowed for the source. Specify an appropriate number based on the resources of the source. Default value: 20.
  6. Configure the resources required by the sync solution.
    In the Set Resources for Solution Running step, set the parameters that are described in the following table. Configure the resources required by the sync solution
    • Offline Sync
      Parameter Description
      Offline task name rules The name of the batch sync node that is used to synchronize the full data of the source. After a sync solution is created, DataWorks first generates a batch sync node to synchronize full data, and then generates real-time sync nodes to synchronize incremental data.
      Resource Groups for Full Batch Sync Nodes

      The exclusive resource group for Data Integration that is used to run the batch sync node.

      Only exclusive resource groups for Data Integration can be used to run solutions. You can set this parameter to the name of the exclusive resource group for Data Integration that you purchased. For more information, see Plan and configure resources.
      Note If you do not have an exclusive resource group, click Create a new exclusive Resource Group to create one.
    • Scheduling Settings
      Parameter Description
      Select scheduling Resource Group

      The resource group for scheduling that is used to run the nodes.

      Only exclusive resource groups for Data Integration can be used to run sync solutions. You can set this parameter to the name of the exclusive resource group for Data Integration that you purchased. For more information, see Plan and configure resources.
      Note If you do not have an exclusive resource group, click Create a new exclusive Resource Group to create one.
    • Incremental Sync
      Parameter Description
      Resource Groups for Incremental Batch Sync Nodes

      The exclusive resource group that is used to run the real-time sync nodes.

      Only exclusive resource groups for Data Integration can be used to run solutions. You can set this parameter to the name of the exclusive resource group for Data Integration that you purchased. For more information, see Plan and configure resources.
      Note If you do not have an exclusive resource group, click Create a new exclusive Resource Group to create one.
    • Channel Settings
      Parameter Description
      Maximum number of connections supported by source read The maximum number of Java Database Connectivity (JDBC) connections that are allowed for the source. Specify an appropriate number based on the resources of the source. Default value: 20.
  7. Click Complete Configuration to return to the Tasks tab.
  8. Find the sync solution to which you added source tables and choose More > Submit and Run in the Operation column. In the Submit and Run message, click OK to run the solution.
    After you submit and run the sync solution to which you added source tables, the system compares the source tables in the original sync solution with the source tables in the new sync solution. If new source tables are detected, the system performs the process of adding the source tables. Perform steps
    Note After you add source tables to the sync solution at a specific point in time, the system starts to load data to these newly added source tables at this point in time. After the data loading ends, the system starts to synchronize data in these source tables to the destination. For example, your sync solution starts to run at 08:00 and is still running at 09:00. You add a source table to the sync solution at 09:00. Then, the system starts to load data to the table from 09:00, and the loading ends at 10:00. In this case, the system stops the real-time sync nodes that are running and starts to synchronize the data that is generated from 09:00 to 10:00 in the newly added source table to the destination Kafka table. The addition of source tables to a sync solution that is running can ensure only the consistency between data before and after the synchronization.
  9. View the addition details of the source tables.
    1. On the Tasks tab, find the sync solution to which you added source tables and click Execution details in the Operation column to go to the details page of the sync solution.
    2. In the steps section, find the Display the increased/decreased table node and click Execution details in the Status column.
      If the status of the Display the increased/decreased table node is Succeeded, the new source tables are added to the sync solution.
    3. View the new source tables that are added to the sync solution.

Remove source tables from the sync solution

  1. Go to the Data Integration page and choose Sync Solutions > Tasks to go to the Tasks page.
    For more information, see Select a synchronization solution.
  2. On the Tasks tab, find the sync solution and choose More > Modify Configuration to go to the solution configuration page.
  3. Remove source tables from the sync solution and update the mappings between the remaining source tables and destination tables.
    1. In the Source Table section of the Set Synchronization Sources and Rules step, select the source tables that you want to remove from the sync solution in the Selected Tables list and click the Icon icon to move the tables back to the Source Table list. Remove the selected source tables
    2. Click Next Step.
    3. In the Set Destination Topic step, click Refresh source table and Kafka Topic mapping to configure the mappings between the source tables and destination Kafka topics.
    4. View the mapping progress, source tables, and mapped destination topics. View the mapping progress, source tables, and mapped destination topics
      Serial number Description
      1 The progress of mapping the source tables to destination tables.
      Note The mapping may require an extended period of time if you want to synchronize data from a large number of tables.
      2
      • If you select Source tables without primary keys can be synchronized., a source table that does not contain a primary key can be synchronized to the destination. However, duplicate data may exist if you perform data synchronization.
      • If you select Send heartbeat record, the real-time sync node writes a record that contains the current timestamp to Kafka every 5 seconds. This way, you can view the updates of the timestamp for the latest record written to Kafka and check the progress of the data synchronization even if no new records are written to Kafka.
      3
      • If the tables in the source database contain primary keys, the system removes duplicate data based on the primary keys during the synchronization.
      • If you select Source tables without primary keys can be synchronized. and the source table does not contain a primary key, click the Edit icon icon to specify a primary key. You can select one or more columns to serve as the primary key. The values of the one or more columns are used to remove duplicate data when you perform data synchronization.
      4 The method that is used to create a destination topic. Valid values: Use Existing Topic and Create Topic.
      5

      The value in the Kafka Topic column varies with the value that you set for Topic creation method.

      • If you set the Topic creation method parameter to Use Existing Topic, you can select the destination topic from the drop-down list in the Kafka Topic column.
      • If you set the Topic creation method parameter to Create Topic, the name of the topic that is automatically created appears in the Kafka Topic column. You can click the automatically created topic to view and modify the name and description of the topic.
      6 You can click Batch Edit Additional Fields in Destination Topic and add fields for multiple Kafka topics in the dialog box that appears. You can also click Edit additional fields in the Actions column to add additional fields for a single Kafka topic.
      Note The Batch Edit Additional Fields in Destination Topic feature takes effect only If you select Create Topic for the Topic creation method parameter.
  4. Click Next Step.
  5. Configure the resources required by the sync solution.
    In the Set Resources for Solution Running step, set the parameters as required. Set Resources for Solution Running
    • Offline Sync
      Parameter Description
      Offline task name rules The name of the batch sync node that is used to synchronize the full data of the source. After a sync solution is created, DataWorks first generates a batch sync node to synchronize full data, and then generates real-time sync nodes to synchronize incremental data.
      Resource Groups for Full Batch Sync Nodes

      The exclusive resource group for Data Integration that is used to run the batch sync node.

      Only exclusive resource groups for Data Integration can be used to run solutions. You can set this parameter to the name of the exclusive resource group for Data Integration that you purchased. For more information, see Plan and configure resources.
      Note If you do not have an exclusive resource group, click Create a new exclusive Resource Group to create one.
    • Scheduling Settings
      Parameter Description
      Select scheduling Resource Group

      The resource group for scheduling that is used to run the nodes.

      Only exclusive resource groups for Data Integration can be used to run sync solutions. You can set this parameter to the name of the exclusive resource group for Data Integration that you purchased. For more information, see Plan and configure resources.
      Note If you do not have an exclusive resource group, click Create a new exclusive Resource Group to create one.
    • Incremental Sync
      Parameter Description
      Resource Groups for Incremental Batch Sync Nodes

      The exclusive resource group that is used to run the real-time sync nodes.

      Only exclusive resource groups for Data Integration can be used to run solutions. You can set this parameter to the name of the exclusive resource group for Data Integration that you purchased. For more information, see Plan and configure resources.
      Note If you do not have an exclusive resource group, click Create a new exclusive Resource Group to create one.
    • Channel Settings
      Parameter Description
      Maximum number of connections supported by source read The maximum number of Java Database Connectivity (JDBC) connections that are allowed for the source. Specify an appropriate number based on the resources of the source. Default value: 20.
  6. Configure the resources required by the sync solution.
    In the Set Resources for Solution Running step, set the parameters that are described in the following table. Configure the resources required by the sync solution
    • Offline Sync
      Parameter Description
      Offline task name rules The name of the batch sync node that is used to synchronize the full data of the source. After a sync solution is created, DataWorks first generates a batch sync node to synchronize full data, and then generates real-time sync nodes to synchronize incremental data.
      Resource Groups for Full Batch Sync Nodes

      The exclusive resource group for Data Integration that is used to run the batch sync node.

      Only exclusive resource groups for Data Integration can be used to run solutions. You can set this parameter to the name of the exclusive resource group for Data Integration that you purchased. For more information, see Plan and configure resources.
      Note If you do not have an exclusive resource group, click Create a new exclusive Resource Group to create one.
    • Scheduling Settings
      Parameter Description
      Select scheduling Resource Group

      The resource group for scheduling that is used to run the nodes.

      Only exclusive resource groups for Data Integration can be used to run sync solutions. You can set this parameter to the name of the exclusive resource group for Data Integration that you purchased. For more information, see Plan and configure resources.
      Note If you do not have an exclusive resource group, click Create a new exclusive Resource Group to create one.
    • Incremental Sync
      Parameter Description
      Resource Groups for Incremental Batch Sync Nodes

      The exclusive resource group that is used to run the real-time sync nodes.

      Only exclusive resource groups for Data Integration can be used to run solutions. You can set this parameter to the name of the exclusive resource group for Data Integration that you purchased. For more information, see Plan and configure resources.
      Note If you do not have an exclusive resource group, click Create a new exclusive Resource Group to create one.
    • Channel Settings
      Parameter Description
      Maximum number of connections supported by source read The maximum number of Java Database Connectivity (JDBC) connections that are allowed for the source. Specify an appropriate number based on the resources of the source. Default value: 20.
  7. Click Complete Configuration to return to the Tasks tab.
  8. Find the sync solution from which you removed source tables and choose More > Submit and Run in the Operation column. In the Submit and Run message, click OK to run the solution.
    If you remove source tables from a sync solution that is running, the source tables are also removed from real-time sync nodes generated by the sync solution. After you submit and run the sync solution from which you removed source tables, the system continues to synchronize data at the time when the sync solution starts to be rerun.
  9. View the removal details of the source tables.
    1. In the steps section, find the Display the increased/decreased table node and click Execution details in the Status column.
      If the status of the Display the increased/decreased table node is Succeeded, the source tables are removed from the sync solution.
    2. View the source tables that are removed from the sync solution.