After a data synchronization solution is configured, you can manage the solution. You can also view the running details of the solution. This topic describes common O&M operations that can be performed on a data synchronization solution.

Background information

This topic describes only common O&M operations that can be performed on a data synchronization solution. For information about how to perform O&M operations on a real-time synchronization node that is generated by a data synchronization solution, see O&M for real-time synchronization nodes. For information about how to perform O&M operations on a batch synchronization node that is generated by a data synchronization solution, see O&M for batch synchronization nodes.

Manage a data synchronization solution

After a data synchronization solution is configured, you can go to the Tasks page in Data Integration in the DataWorks console to view the data synchronization solution. This page displays all created data synchronization solutions. You can specify filter conditions to search for the desired data synchronization solution. Then, you can perform the operations that are described in the following table on the data synchronization solution.
OperationDescription
StartYou can click Submit and Run in the Actions column of the data synchronization solution to start the solution.
EditIn a business scenario, your business in the production environment may evolve over time. Your business tables may increase or decrease. In this case, you may need to adjust the number of business tables from which you want to synchronize data. Data Integration allows you to adjust the source tables that are specified in your data synchronization solution. You can click More in the Actions column of the data synchronization solution and select Modify Configuration to go to the configuration page of the solution. On the configuration page, you can add or remove source tables based on your business requirements. After the adjustment is complete, you can go back to the Tasks page, find the data synchronization solution, and then click Submit and Run in the Actions column of the solution to run the solution. When you rerun the solution, the system compares the source tables specified in the solution in this run with the source tables specified in the solution in the previous run. If new tables are detected, the system runs the solution to synchronize data from the tables. For more information, see Add or remove source tables to or from a synchronization solution that is running.
If you run a one-click real-time synchronization solution, the solution synchronizes full data from the newly added tables. After the full data is synchronized, the system runs the real-time synchronization node generated by the solution to synchronize incremental data from the newly added tables in real time.
Note
  • After you add source tables to the data synchronization solution and start the solution, the system first synchronizes full data in the added source tables and then synchronizes incremental data in the added source tables from the point in time when the full data starts to be synchronized. For example, your data synchronization solution starts to run at 08:00 and is still running at 09:00. You add a source table to the data synchronization solution at 09:00. The system starts to run the solution to synchronize full data from the table from 09:00, and the synchronization ends at 10:00. Then, the system stops the real-time synchronization node that is running and starts to synchronize the incremental data that is generated from 09:00 to 10:00 in the table to the destination table. If you add tables to a data synchronization solution after the solution is run, the system ensures only the consistency between data before and after the synchronization. Data may be inconsistent during the synchronization.
  • If you want to synchronize full data from all source tables specified in a data synchronization solution, you must forcefully rerun the solution.
Forcefully rerunIn some cases, you can click More in the Actions column of the solution and select Force Rerun to rerun the solution. For example, if data in the source is contaminated or errors occur on data links, you can perform the rerun operation. After you forcefully rerun the solution, the system synchronizes full data and incremental data from the source to the destination again.
Note
  • Only data synchronization solutions that are used to synchronize data to Hologres and MaxCompute can be forcefully rerun.
  • A data synchronization solution that is used to synchronize data from tables in sharded databases cannot be forcefully rerun.
In the following scenarios, a one-click real-time synchronization solution used to synchronize data to MaxCompute needs to be rerun to restore data:
  • The source of the solution is a MySQL data source, and the real-time synchronization node generated by the solution fails for a long period of time. As a result, the binary logs are deleted, and incremental data in the MySQL data source cannot be synchronized.
  • Destination tables do not contain the fields that are newly added to source tables due to various reasons.
  • Data accuracy issues such as data loss occur on data synchronized to destination tables due to various reasons.
Important
  • If you forcefully rerun a data synchronization solution, the solution synchronizes data from the fields in source tables to the fields in destination tables again. If some fields in source tables do not exist in destination tables, the system automatically adds the same fields in the destination tables to ensure field consistency.
  • Before you forcefully rerun a data synchronization solution, you must check whether the rerun operation will lead to a conflict between the instances of the merge nodes generated by the solution before and during the forceful rerun. When you forcefully rerun the data synchronization solution, the instance of the merge node generated before the forceful rerun may be running or be going to run. If the data timestamps of the instances are the same and the instances are run at the same time, data in destination partitions or tables may overwrite each other.
    You can go to the Cycle Instance page in Operation Center and view the running situation of the instance of the merge node that is generated by the data synchronization solution before the forceful rerun. If the rerun operation will lead to a conflict between the instances of the merge nodes generated by the solution before and during the forceful rerun, you can perform one of the following operations to resolve the issue:
    • If the instance of the merge node that is generated by the data synchronization solution before the forceful rerun is running, rerun the data synchronization solution after the instance finishes running.
    • If the instance of the merge node that is generated by the data synchronization solution before the forceful rerun has not started to run, freeze the instance. Unfreeze the instance after the rerun operation is complete.
  • If data is not generated or the automatic scheduling of the merge node is not resumed on the next day after you forcefully rerun a data synchronization solution, you must check whether the following issues exist and manually resume the scheduling of the instance of the merge node:
    • If latency occurs on the data synchronization solution, resolve the latency issue. For more information, see Solutions to latency on a real-time synchronization node.
    • If the instance of the merge node in the previous cycle is not run or failed to be run, you can remove the dependency of the instance of the merge node in the current cycle on the instance of the merge node in the previous cycle. For information about how to view the information of an auto triggered instance, see View auto triggered node instances.
Backfill full dataYou can perform this operation if you need to synchronize full data from the source again to resolve data accuracy issues, such as data loss, that occur on the data synchronized to MaxCompute tables in the data synchronization solution.
Note
  • Only one-click real-time synchronization solutions used to synchronize data to MaxCompute support full data backfill.
  • Data synchronization solutions that are used to synchronize data from tables in sharded databases do not support full data backfill.
To backfill full data for a one-click real-time synchronization solution used to synchronize data to MaxCompute, find the solution on the Tasks page in Data Integration, click More in the Actions column, and then select Backfill Data for All Data.
  1. Select the data timestamp of the data backfill instance.

    If destination MaxCompute tables are partitioned tables, the solution synchronizes full data from the source to the date partitions that are specified by the data timestamp.

  2. Select source tables based on which you want to backfill full data.

    In the list on the left, select the tables from which you want to synchronize full data. Click the More icon to move the selected tables to the list on the right.

  3. Click Confirmation.
Important
  • You can select only a single day as a data timestamp. If you want to backfill full data for multiple days, you must perform the full data backfill operation multiple times.
  • A one-click full synchronization solution synchronizes data from the source fields whose names are the same as destination fields and the additional source fields defined in the solution.
  • Before you backfill full data for a one-click real-time synchronization solution used to synchronize data to MaxCompute, you must check the data timestamp of the data backfill instance. You must make sure that the data backfill instance does not conflict with the instance of the merge node generated before the full data backfill operation. If the data timestamps of the instances are the same and the instances are run at the same time, data in destination partitions or tables may overwrite each other.
    You can go to the Cycle Instance page in Operation Center and view the running situation of the instance of the merge node. If the data backfill instance conflicts with the instance of the merge node, you can perform one of the following operations to resolve the issue:
    • If the instance of the merge node is running, backfill full data for the solution after the instance finishes running.
    • If the instance of the merge node has not started to run, freeze the instance. Unfreeze the instance after the full data backfill operation is complete.
StopIf the data synchronization solution is running and you want to stop the running of the solution, you can click Stop in the Actions column of the solution.

View the status overview of data synchronization solutions

You can go to the Running Status Overview page in Data Integration and specify a period of time to view the status overview of data synchronization solutions. The Running Status Overview page contains the following sections:
  • Solution Status Distribution: displays the total number of data synchronization solutions and displays the status distribution of the solutions in a pie chart. The statistical data about the status distribution shows the number of solutions that are successfully run and the number of solutions that fail to be run. The statistical data is collected in the specified period of time. You can click a sector in the pie chart to go to the solution list page. On this page, you can view the solutions that are successfully run or fail to be run, and the running details of a solution. For more information about the running details of a data synchronization solution, see View the running details of a data synchronization solution.
  • Usage of Resources in Resource Groups: displays the specifications and resource usage of the resource groups that are used within the current Alibaba Cloud account. You can click the name of a resource group to go to the details page of the resource group. On the details page, you can view the basic information and resource usage of the resource group. For information about resource groups, see View the resource usage of an exclusive resource group.
  • Batch Synchronization Nodes: displays the number of batch synchronization nodes generated by specific data synchronization solutions, the data synchronization speed, the status distribution of the batch synchronization nodes, and the details of the synchronized data. The statistical data is collected in the specified period of time.
    • The statistical data about the status distribution shows the number of the batch synchronization nodes that are successfully run and the number of the batch synchronization nodes that fail to be run.
    • The Synchronization Data subsection displays the following items:
      • Number of synchronization nodes: the number of batch synchronization nodes that are successfully run
      • Amount of data synchronized: the amount of data synchronized by batch synchronization nodes that are successfully run or running
      • Number of data records synchronized: the number of data records that are synchronized by batch synchronization nodes
    Note The statistical data in the Batch Synchronization Nodes section is updated per hour.
  • Real-time Synchronization Nodes: displays the number of real-time synchronization nodes generated by specific data synchronization solutions, the data synchronization speed, the status distribution of the real-time synchronization nodes, and the top 10 nodes with the highest latency. You can click the name of a node to go to the Real Time DI page and view the details of the node.

View the running details of a data synchronization solution

You can click Data Synchronization Node in the left-side navigation pane of the Data Integration page to go to the Tasks page.

On the Tasks page, you can view information, such as the type and name, of a data synchronization solution and the operations that you can perform on the solution. You can also click Execution details in the Actions column of a data synchronization solution to view the running details of the solution. The Execution details page contains the following sections:
  • Upper part in the Running Data section: displays information such as the status of environment preparation, batch synchronization nodes, and the real-time synchronization node. You can check whether the nodes are run as expected based on their status. This way, you can troubleshoot the issues that occur on the data synchronization solution at the earliest opportunity. The following icons are used to indicate different states:
    • If the Succeeded icon is displayed, the node is successfully run.
    • If the Exception icon is displayed, the node failed to be run.
    • If the Waiting icon is displayed, the node is waiting to be run.
  • Lower part of the Running Data section: displays the information about the batch synchronization nodes and the real-time synchronization node generated by the solution. The information includes the source name, data synchronization speed, synchronized data, resource group that is used, and data synchronization latency.
  • Steps section: displays all steps that are required to complete the data synchronization solution from node creation to running of batch synchronization nodes and the real-time synchronization node. You can view the start time, end time, and status of each step in this section.

View the running details of a node

After you configure and run a data synchronization solution, nodes are generated by the solution in DataStudio and are automatically deployed to the production environment. You can obtain the name of a node that is generated by the data synchronization solution in Data Integration and go to Operation Center to view the running details of the node.
  1. Obtain the name of a node generated by a data synchronization solution

    On the Tasks page in Data Integration, find the desired data synchronization solution and click Execution details in the Actions column. In the Steps section of the Execution details page, find the step for generating a node for the solution and click Execution details in the Status column to obtain the name of the node that is generated.

  2. View the running details of the node
    • Batch synchronization of all data in a database

      If you run a batch synchronization solution that is used to synchronize all data from a database and configured with scheduling settings, multiple auto triggered nodes are generated by the solution. You can go to the Cycle Instance page in Operation Center to view the running details of the instances generated for the desired auto triggered node.

    • One-click real-time synchronization (one-time full synchronization and real-time incremental synchronization)

      If you run a one-click real-time synchronization solution to synchronize full data from a source at a time and incremental data from the source in real time, batch and real-time synchronization nodes are generated by the solution. You can go to the Patch Data page in Operation Center to view the running details of a batch synchronization node and go to the Real Time DI page in Operation Center to view the running details of a real-time synchronization node.