DataWorks provides a real-time data synchronization feature. This feature lets you synchronize data from a source database to a destination database in real time for a single table or an entire database. You can create a real-time sync task with various combinations of input and output data sources to perform real-time incremental synchronization. This topic describes how to create a real-time sync task for incremental data from a single table or an entire database and how to check the running status of the task after it is created.
Prerequisites
Data sources are configured. Before you configure a real-time sync task, you must ensure that the source and destination databases are set up. This lets you control read and write operations by selecting the data source names in the sync task. For more information, see Supported data sources and synchronization solutions.
A serverless resource group is used and attached to the workspace.
A network connection is established between the resource group and the data sources. For more information, see Network connectivity solutions.
Step 1: Create a real-time synchronization node
Follow these steps to create a real-time synchronization node.
Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose in the Actions column.
Create a real-time synchronization node.
In the navigation pane on the left, click the
icon. In the directory tree, find Workspace Directories, click the
icon next to it, and choose to open the Create Node dialog box.In the Create Node dialog box, select a Synchronization Method. The available methods are as follows:
Sync task configuration type
Synchronization method
ETL Between Single Table (Topic) and Single Table (Topic)
Synchronization of Data Changes from Database to MaxCompute
Synchronization of Data Changes from Database to Hologres
Synchronization of Data Changes from Database to AnalyticDB for MySQL V3.0
Synchronization of Data Changes from Database to DataHub
Synchronization of Data Changes from Database to Kafka
In the Create Node dialog box, enter a name for the real-time synchronization node and click OK to proceed to the node configuration page.
Step 2: Configure the real-time sync task
The parameters that you need to configure for the real-time synchronization node depend on the synchronization method that you selected when you created the node.
Configure a real-time sync task for a single table
On the node configuration page, follow these steps to configure the input data source, data transformation, and output data source for a single-table real-time sync task:
Configure the input data source.
In the Input section on the left side of the node configuration page, drag the data source component to the canvas.

Click the input component to configure its parameters in the Node Configuration dialog box on the right.
The following input data source types are supported for single-table synchronization. For more information about the configuration of each data source type, see the linked topics.
(Optional) Configure data transformation.
In the Conversion section on the left, drag a transform component onto the canvas and connect it to the upstream component.

Click the transform component, and then configure its parameters in the Node Configuration dialog box on the right.
The following transformation methods are supported for single-table synchronization. For more information about the configuration of each method, see the linked topics.
Data filtering: Filters data based on rules, such as field size. Only data that meets the specified rules is retained.
String replacement: Replaces the values of string fields.
Data masking: Masks data from a single table during real-time synchronization and stores the masked data in a specified database location.
NoteYou can use multiple transform components to filter and process data.
Configure the output data source.
In the Output section on the left, drag the target output data source component onto the canvas. Connect the component to its upstream component. Configure the destination data source, table, and field mappings. If the destination table does not exist, click Create Table to create it.

Click the output component to configure its parameters in the Node Configuration dialog box on the right.
The following output data source types are supported for single-table synchronization. For more information about the configuration of each data source type, see the linked topics.
In the toolbar above the canvas, click Save to save the task configuration.
Configure a real-time sync task for an entire database
On the node configuration page, follow these steps to configure a real-time sync task for an entire database:
Configure the synchronization source and rules.
In the Data Source section, select the Type and Data Source.
In the Select Source Table for Synchronization section, select one or more tables from the Source Tables area and click the
icon to move them to the Selected Tables area.Set the table name mapping rules.
NoteIn this step, you can select the databases and tables from the source to sync. By default, data from a source database or table is written to a destination schema or table with the same name. If a destination schema or table does not exist, the system creates it automatically. You can also use Set Mapping Rules for Table/Database Names to customize the names of the destination schemas or tables.
Source table name and destination Table Name conversion rules: You can use a regular expression to convert source table names to destination table names.
Example 1: This example synchronizes data from source tables with the prefix
doc_to destination tables with the prefixpre_.
Example 2: Write data from multiple tables to a single destination table.
To synchronize data from the source tables
table_01,table_02, andtable_03to the single tablemy_table, configure the regular expression conversion rule by setting Source totable_*and Destination tomy_table.
Rule for Destination Table Table Name: You can use built-in variables to generate destination table names. You can also add a prefix and a suffix to the generated names. The available built-in variables are:
${db_table_name_src_transed}: The name of the table after it is converted based on the source and destination table name conversion rules.${db_name_src_transed}: The destination schema name generated by the conversion rules for the source database and destination schema name.${ds_name_src}: The name of the source data source.
For example, if you used a table name conversion rule in the previous step to rename a source table to
my_table, you can then perform further string concatenation on the new table name. To add a prefix and a suffix, use the${db_table_name_src_transed}variable, which represents the result from the previous step. For example, the expressionpre_${db_table_name_src_transed}_postmaps the source table to a destination table namedpre_my_table_post.
Click Next to continue to the Set Destination Table page.
Set the destination table.
On the Set Destination Table or Topic page, you can configure basic information for the Destination Data Source, such as the write mode and partition settings. The available settings depend on the real-time synchronization interface of each data source.
Click Refresh source and destination table mapping to create mappings between the source and destination tables.
You can use Edit Additional Fields or Batch Edit Additional Fields In Destination Table to add constants, variables, and other operations to the destination table. The configuration depends on the real-time synchronization interface for each data source.
If the destination table does not have a primary key, migration is not supported. As an alternative, you can set a Synchronized Primary Key.
NoteIf you are synchronizing many tables, the process may be slow. Please wait for the process to complete.
Click Next to proceed to the Configure Table-level Synchronization Rule page.
NoteIf the system detects that some of your tables need to be automatically created, you must wait for the tables to be created before you can proceed to the next step.
(Optional) Set table-level synchronization rules.
Some synchronization solutions support custom table-level DML processing policies. This lets you define how to handle insert, update, and delete operations that occur on the source table.
NoteThe supported DML operations may vary based on the data source. Whether a specific synchronization solution supports DML processing policies depends on the product interface. For more information about the DML operations that are supported by different data sources, see Supported DML and DDL operations.
Click Next to proceed to the Set Processing Policy For DDL Messages page.
Set DDL message processing rules.
On the Set Processing Policy for DDL Messages page, you can configure how to process DDL operations that occur in the source data source. During real-time synchronization, you can set processing policies for different DDL messages that are synchronized to the destination as needed. The supported DDL operations vary based on the data source. For more information, see Supported DML and DDL operations. You can set a DDL processing policy for each destination database type. The following table describes the DDL message processing policies.
DDL message type
Processing policy
CreateTable
When DataWorks receives a DDL message of the corresponding type, the processing policy is as follows:
Normal treatment: This DDL message is passed to the destination data source for processing. Different destination data sources may have different processing policies.
Ignore: This DDL message is discarded and not sent to the destination data source.
Error: The real-time sync task is immediately terminated with an error status.
DropTable
AddColumn
DropColumn
RenameTable
RenameColumn
ChangeColumn
TruncateTable
Click Next to navigate to the Configure Resource page.
Set runtime resources.
: Specifies the maximum number of concurrent threads for a task to write to the destination database.
: Specifies whether the sync task tolerates dirty data.
If you do not allow dirty data, the task fails and stops if dirty data is generated during execution.
If you allow dirty data, the task ignores the dirty data and continues to run. The dirty data is not written to the destination.
Complete the configuration.
Click Complete to finish the configuration of the real-time sync task for the entire database.
Step 3: Configure runtime resources for the real-time synchronization node
After you complete the configuration, follow these steps to configure the runtime resources for the real-time synchronization node.
On the node configuration page, select Basic Configurations from the toolbar on the right.
On the Basic Configurations page, select the serverless Resource Group that you have attached. Configure the number of Occupancy CUs and the Concurrency for task execution.
After configuring the runtime resources, save the real-time sync task by clicking the Save button in the toolbar at the top of the node configuration page.
Step 4: Run the real-time sync task
After you configure the real-time synchronization node, follow these steps to run the task.
Publish the node.
The task must be published to the Operation Center before it can be executed. Follow the on-screen instructions to publish the real-time synchronization node. For more information, see Publish a node or workflow.
Run the node.
After the node is published, click Perform O&M at the bottom of the Deploy to Production Environment section to open the Real-time Synchronization Nodes list page in the Operation Center.
Find the real-time sync task you created and click Start in the Actions column. On the Start page, configure settings such as Reset site, and click Determine to start the task.
NoteFor more information about whether to manually set the offset, see the Appendix: About resetting the offset.
When the task's Status changes to Running, click its Node Name to open the Running Information page.
On the Running Information page, click the Log tab to view the task execution status.
Appendix: About resetting the offset
You may need to manually set the offset for a DataWorks real-time sync task in the following situations:
Task recovery after an interruption: When you restart the task, you can manually set the offset to the time of the interruption. This ensures that synchronization resumes from the breakpoint.
Data loss or exceptions: You can reset the offset to a point in time before the data was written to ensure data integrity.
Task configuration adjustments: After you modify the destination table or field mappings, you can manually set the offset to ensure synchronization accuracy.
If you encounter an offset error or an offset that does not exist, you can resolve the issue in one of the following ways:
Reset the offset: When you start the task, select the earliest available offset from the source database.
Adjust the log retention period: If an offset has expired, you can adjust the log retention period in the database. For example, you can set the retention period to 7 days.
Data synchronization: If data is lost, you can perform a full synchronization again or configure a batch synchronization task.