After configuring data sources, networks, and resources, you can create a real-time synchronization task to synchronize single-table or whole-database data. This topic describes how to create a real-time synchronization task for single-table or whole-database incremental data and view the task status after creation.
Prerequisites
Completed Data source configuration. You must configure source and destination databases before creating a task. This ensures you can select the required data sources for reading and writing during task configuration. For details on supported data sources and configurations for real-time synchronization, see Supported data sources and synchronization solutions.
Purchase a resource group with a suitable specification and attach it to the workspace. For more information, see Use a Serverless resource group for Data Integration and Use exclusive resource groups for Data Integration.
Establish a network connection between the resource group and the data source. For more information, see Configure network connections.
Limitations
This version of real-time synchronization is supported only in DataStudio (legacy version).
Go to Data Studio
For some channels, you must create single-table real-time synchronization tasks in Data Studio. For details on channel support, see Supported data sources.
Log on to the DataWorks Console. Select the desired region, and click in the left navigation pane. Select the desired workspace from the drop-down list and click Go to Data Studio.
Step 1: Create task
Data Studio (new version)
Create a workflow. For details, see Workflow orchestration.
Create a real-time synchronization node. You can use one of the following methods:
Method 1: Click the
icon in the upper-right corner of the workflow list and select .Method 2: Double-click the workflow name, expand the Data Integration directory in the node list on the left, and drag the Real-time Synchronization node to the canvas on the right.

In the Create Node dialog box, configure the node parameters and click OK.
Data Studio (legacy version)
Create a business flow. For details, see Create a workflow.
Create a real-time synchronization task.You can use one of the following methods:
Method 1: Expand the business flow, right-click .
Method 2: Double-click the business flow name, click Create Node, and then drag the Real-time Synchronization node from the Data Integration directory to the business flow editing panel on the right.

In the Create Node dialog box, configure the parameters.
Parameter | Description |
Type | The default is Real-time Synchronization. |
Synchronization Method |
|
Path | The directory where the real-time synchronization task is stored. |
Name | The node name can contain letters, Chinese characters, digits, underscores (_), and periods (.). It cannot exceed 128 characters. |
Step 2: Configure resource group
Real-time synchronization tasks can only use Serverless resource groups or exclusive resource groups for Data Integration. You can click Basic Configuration in the right navigation bar of the real-time synchronization task editing page. In the Resource Group drop-down list, select a resource group that has network connectivity to the database.
If you created a resource group but it is not displayed, check whether the resource group is attached to the workspace. For more information, see Use a Serverless resource group for Data Integration and Use exclusive resource groups for Data Integration.
We recommend running real-time and offline synchronization tasks on different resource groups to prevent resource contention. Resource contention for CPU, memory, and network resources can cause offline tasks to slow down, real-time tasks to be delayed, or tasks to be terminated by the OOM (Out of Memory) killer in extreme cases.
Serverless resource groups support specifying a CU limit for synchronization tasks. If your synchronization task experiences OOM issues due to insufficient resources, adjust the CU usage value of the resource group appropriately.
Step 3: Configure task
Configure single-table synchronization
Configure the input data source.
Currently, the supported input data source types and configurations for single-table data synchronization are as follows:
Drag the input data source component from the Input list on the left to the canvas on the right.

Click the input component and configure the relevant information in the Node Configuration dialog box on the right.
Optional:Configure data transformation.
You can configure data transformations to modify input data.
Currently, the supported transformation methods for single-table data synchronization are as follows:
Configure a data filtering transform: Filters data based on rules. Only matching data is retained.
Configure String Replace: Allows replacement of string-type fields.
Configure data masking: Allows masking of sensitive data during single-table real-time synchronization and storage in a specified database location.
In the Conversion area on the left side of the real-time synchronization task editing page, drag the required data transformation component to the panel on the right. Hover over the input component to reveal its connection points. Connect the bottom connection point of the input component to the top connection point of the transformation component. After connecting, you can perform Node Configuration for the transformation component.

Click the transformation component and configure the relevant information in the Node Configuration dialog box on the right.
Configure the output data source.
Currently, the supported output data source types and configurations for single-table data synchronization are as follows:
In the Output area on the left side of the real-time synchronization task editing page, drag the destination output data source component to the panel on the right and connect it to the upstream component. Configure the destination data source, table, and field mapping relationships. If the destination table does not exist, click Create Table to quickly create the table.

Click the output component and configure the relevant information in the Node Configuration dialog box on the right.
Click Save in the toolbar above the canvas to complete the task configuration.
Configure whole-database synchronization
DataWorks recommends using Real-time database synchronization tasks in Data Integration.
Set the synchronization source and rules.
In the Data Source area, select the Type and Data Source name to be synchronized.
Select the tables to synchronize.
In the Select Source Table for Synchronization section, select the tables to synchronize from the Source Table list and click the
icon to move them to the Selected Table list.ImportantReal-time synchronization cannot be performed if the selected table does not have a primary key.
Set mapping rules for table names.
In this step, you can select the databases and tables to be synchronized from the source data source. By default, the synchronization solution writes source database and table data to a schema or table with the same name in the destination. If the schema or table does not exist in the destination, it will be automatically created. You can also define the final schema or table name written to the destination by using Mapping Rules for Table Names. This allows you to write data from multiple tables into a single table or uniformly update a fixed prefix of source database or table names when writing to the destination database or table.
Source table name and destination table name conversion rules: Supports converting source table names to final destination table names using regular expressions.
Example 1: Write data from tables with the "doc_" prefix at the source to tables with the "pre_" prefix at the destination.

Example 2: Write data from multiple tables to a single destination table.
Synchronize tables named "table_01", "table_02", and "table_03" at the source to a table named "my_table". Configure the regex table name conversion rule as: Source: table.*, Destination: my_table.

Rule for Destination Table Name: Supports generating destination table names using a combination of built-in variables. You can also add prefixes and suffixes to the converted destination table name. The available built-in variables are:
${db_table_name_src_transed}: The table name after conversion in Source table name and destination table name conversion rules.
${db_name_src_transed}: The destination schema name after conversion in "Source Database and destination Schema Name Conversion Rules".
${ds_name_src}: The source data source name.
Example: Further process the table name converted in the previous step by string concatenation. Use ${db_table_name_src_transed} to represent the result "my_table" from the previous step, and add a prefix and suffix to this built-in variable, for example, pre_${db_table_name_src_transed}_post. This maps to the destination table named "pre_my_table_post".
Rule for Conversion Between Source Database Name and Destination Schema Name: Supports converting source schema names to final destination schema names using regular expressions.
Example: Replace the "doc_" prefix of the source database name with the "pre_" prefix.

Select the destination data source and configure the destination table or topic.
On the Configure Destination Table page, configure the basic information of the destination. For example, write mode and partition settings. The specific configuration depends on the real-time synchronization interface of each data source.
Click Refresh Source and Destination Table Mapping to create mapping relationships between the source tables and destination tables to be synchronized.
You can customize the destination schema and destination table name, and add constants or variables to the destination table via Edit Additional Fields. The specific configuration depends on the real-time synchronization interface of each data source.
NoteIf there are many tables to synchronize, the process may be slow. Please wait patiently.
Optional:Set table granularity synchronization rules.
Some synchronization solutions support custom table-level DML processing policies. You can define processing policies for insert, update, and delete operations on the source table.
NoteSupported DML operations may vary by data source. Refer to the product interface to see if a specific synchronization solution supports DML processing policies. For current DML support by data source, see Supported DML and DDL operations.
Set DDL message processing rules.
You can configure policies to handle DDL operations during synchronization. Supported DDL operations may vary by data source. For details, see Supported DML and DDL operations. You can configure DDL processing policies for each destination database type on the page. Different DDL message processing policies are shown in the following table.
DDL Message Type
Processing Policy
Create Table
When DataWorks receives a DDL message of the corresponding type, the processing policy is as follows:
Normal Processing: Forwards the message to the destination data source.
Ignore: Discards the message directly and does not send it to the destination data source.
Alert: Discards the message directly and records an alert in the real-time synchronization log, indicating that the message was discarded due to an execution error.
Error: The real-time synchronization task directly displays an error status and terminates execution.
Drop Table
Add Column
Drop Column
Rename Table
Rename Column
Modify Column Type
Truncate Table
Configure run resources.
Provides concurrency control to limit the maximum number of concurrent reads and writes for Data Integration.
Supports controlling whether the synchronization task tolerates dirty data.
When dirty data is not allowed: If dirty data is generated during task execution, the task will fail and exit.
When dirty data is allowed: The synchronization task will ignore dirty data (it will not be written to the destination) and continue execution normally.
Click Complete Configuration.
Step 4: Submit and deploy nodes
Click the
icon in the toolbar to submit the node.In the Submit dialog box, enter the Change Description.
Click OK.
If you are using a workspace in standard mode, you need to deploy the task to the production environment after successful submission. Click Deploy on the left side of the top menu bar. For details, see Publish tasks.
Step 5: Run task
Real-time synchronization tasks cannot run in Data Studio. Publish the task to the Operation Center to start and view it.
After the task configuration is complete, you can start and manage the task in the panel. For details, see O&M for real-time sync tasks.
Next steps
After the task starts, you can click the task name to view the running details and perform Task O&M and tuning.
FAQ
For common questions about real-time synchronization tasks, see Real-time synchronization.
Appendix: Task migration
You can migrate single-table real-time integration tasks from Data Studio to the Data Integration page by clicking Migrate to Data Integration.
Currently supported real-time integration tasks:
Kafka to MaxCompute single-table real-time integration tasks.
Kafka to Hologres single-table real-time integration tasks.
Double-click the single-table real-time integration task to be migrated to enter the task editing page, and click Migrate to Data Integration to migrate the task.

Click
in the upper-left corner and select . Go to the Synchronization Task page to view the successfully migrated single-table real-time integration tasks in the task list.
Migrated tasks can be maintained directly on the Data Integration main site without jumping to the Operation Center. Migration does not affect saved task configurations or running tasks.
After migration, the original task will be moved to the Data Studio Recycle Bin. Subsequent editing and maintenance operations can only be performed on the Data Integration main site task list page.