After you configure the data source, network, and resources, you can create a real-time sync task. You can combine various source and destination data sources to perform real-time incremental synchronization for a single table or an entire database. This topic describes how to create these tasks and check their status.
Preparations
Configure the data source. For more information, see Data Source Configuration. Before you configure a Data Integration sync task, you must configure the source and destination databases. This lets you control read and write operations by selecting the data source name during task configuration. For more information about the data sources that support real-time synchronization and their configurations, see Supported data sources and synchronization solutions.
Purchase a resource group with a suitable specification and attach it to the workspace. For more information, see Use a Serverless resource group for Data Integration and Use an exclusive resource group for Data Integration.
Establish a network connection between the resource group and the data source. For more information, see Configure network connections.
Go to Data Studio
The configuration entry for real-time sync tasks for a single table for some channels is in the Data Studio module. You must go to Data Studio to create these tasks. For more information about supported channels, see Supported data sources.
Log on to the DataWorks console. Switch to the destination region. In the navigation pane on the left, choose . Select the desired workspace from the drop-down list and click Go To Data Studio.
Step 1: Create a real-time sync task
Data Studio (new version)
Create a workflow. For more information, see Workflow orchestration.
Create a real-time sync node. You can create the node in one of the following two ways:
Method 1: In the upper-right corner of the workflow list, click the
icon and choose .Method 2: Double-click the workflow name. From the node list on the left side of the workflow canvas, drag the Real-time Synchronization node from Data Integration to the business process editing panel on the right.

In the Create Node dialog box, configure the node parameters and click OK.
DataStudio (legacy version)
Create a business process. For more information, see Create a business process.
Create a real-time sync task. You can create the task in one of the following two ways.
Method 1: Expand the business process, right-click .
Method 2: Double-click the business process name and click New Node. Then, from the Data Integration folder, drag the Real-time Synchronization node to the business process editing panel on the right.

In the New Node dialog box, configure the parameters.
Parameter | Description |
Type | The default value is Real-time Synchronization. |
Synchronization Method |
|
Path | The folder where the real-time sync task is stored. |
Name | The node name can contain uppercase letters, lowercase letters, Chinese characters, digits, underscores (_), and periods (.). The name cannot exceed 128 characters in length. |
Step 2: Configure a resource group
Real-time sync tasks support only Serverless resource groups or exclusive resource groups for Data Integration. On the configuration page of the real-time sync task, click Basic Configuration in the right-side navigation bar. From the Resource Group drop-down list, select a resource group that is connected to the database network.
If you created a resource group but it is not displayed, check whether the resource group is attached to the workspace. For more information, see Use a Serverless resource group for Data Integration and Use an exclusive resource group for Data Integration.
Run real-time sync tasks and offline sync tasks in different resource groups. This prevents resource preemption and interference between running tasks. For example, interference with CPU, memory, and network resources can slow down offline sync tasks, cause delays in real-time sync tasks, or even cause tasks to be terminated by the out-of-memory (OOM) killer if resources are insufficient.
Serverless resource groups allow you to specify the maximum number of CUs for a sync task. If your sync task experiences an OOM error because of insufficient resources, adjust the CU usage value for the resource group.
Step 3: Configure the real-time sync task
Configure a real-time sync task for a single table
Configure the source data source.
The following source data source types and their configurations are supported for single-table data synchronization:
In the Input section on the left side of the real-time sync task configuration page, drag the target source data source component to the panel on the right.

Click the source component and configure its parameters in the Node Configuration panel on the right.
Optional: Configure a data transform method.
If you want to transform the source data into the required output format during real-time synchronization, you can configure a data transform method.
The following transform methods are supported for single-table data synchronization:
Configure a data filtering transform: You can filter data based on rules, such as the size of a field. Only data that meets the rules is retained.
Configure string replacement: You can replace string fields.
Configure data masking: You can mask single-table data that is synchronized in real time and then store it in a specified database location.
In the Conversion section on the left side of the configuration page, drag the required data transform component to the panel on the right. Hover over the source component to display its connection points. Connect the lower point of the source component to the upper point of the transform component. After they are connected, you can configure the transform component in the Node Configuration panel.

Click the transform component and configure its parameters in the Node Configuration panel on the right.
Configure the destination data source.
The following destination data source types and their configurations are supported for single-table data synchronization:
In the Output section on the left side of the configuration page, drag the target destination data source component to the panel on the right and connect it to the upstream component. Configure the destination data source, table, and field mapping. If the destination table does not exist, you can click Create Table to quickly create it.

Click the destination component and configure its parameters in the Node Configuration panel on the right.
In the toolbar above the canvas, click Save to complete the task configuration.
Configure a real-time sync task for an entire database
DataWorks recommends that you use the real-time sync task for an entire database in Data Integration.
Set the source and synchronization rules.
In the Data Source section, select the Type and Data Source name of the data source to synchronize.
Select the tables to synchronize.
In the Select Source Table for Synchronization section, all tables in the selected data source are displayed. In the Source Databases And Tables section, select all or some of the tables from the database that you want to synchronize. Then, click the
icon to move them to the Selected Tables list.ImportantIf a selected table does not have a primary key, it cannot be synchronized in real time.
Set the mapping rule for table names.
In this step, you can select the databases and tables from the source data source to synchronize. By default, the synchronization solution writes the source database and tables to a schema or table with the same name in the destination. If the schema or table does not exist in the destination, it is automatically created. You can also define the names of the destination schema or tables using Set Mapping Rules for Table/Database Names. This lets you write data from multiple tables into a single table, or to update a fixed prefix for source database or table names to another prefix when writing to the destination.
Source table name and destination Table Name conversion rules: You can use a regular expression to convert the source table name to the final destination table name.
Example 1: Write data from source tables with the `doc_` prefix to destination tables with the `pre_` prefix.

Example 2: Write data from multiple tables to a single destination table.
To synchronize source tables named "table_01", "table_02", and "table_03" to a single destination table named "my_table", configure the regular expression conversion rule for table names as follows: Set Source to `table.*` and Destination to `my_table`.

Rule for Destination Table Table Name: You can use a combination of built-in variables to generate the destination table name. You can also add a prefix and a suffix to the converted destination table name. The available built-in variables are:
${db_table_name_src_transed}: The table name after the conversion in the "Source and destination table name conversion rule".
${db_name_src_transed}: The destination schema name after the conversion in the "Source database and destination schema name conversion rule".
${ds_name_src}: The source data source name.
Example: Further process the table name that was converted by the source and destination table name conversion rule in the previous step. Use `${db_table_name_src_transed}` to represent the result of the previous step, which is "my_table". Then, add a prefix and a suffix to this built-in variable, such as `pre_${db_table_name_src_transed}_post`. The final destination table name is "pre_my_table_post".
Source Database And Destination Schema Name Conversion Rule: You can use a regular expression to convert the source schema name to the final destination schema name.
Example: Replace the `doc_` prefix of the source database name with the `pre_` prefix.

Select the destination data source and configure the destination table or topic.
On the Set Destination Table Or Topic page, configure basic information for the Destination Data Source. This includes parameters such as the write mode and partition settings. The specific configuration varies based on the real-time synchronization interface of each data source.
Click Refresh Source table and {dataSourceLabel} table mapping to create a mapping between the source and destination tables to synchronize.
You can customize the destination schema and table names and add constants and variables to the destination table by clicking Edit additional fields. The specific configuration varies based on the real-time synchronization interface of each data source.
NoteIf you synchronize many tables, the process may be slow. Please wait.
Optional: Set table-level synchronization rules.
Some synchronization solutions support custom table-level DML processing policies. This means you can define corresponding processing policies for insert, update, or delete operations that occur in the source table.
NoteThe supported DML operations may vary between data sources. Whether a synchronization solution supports DML processing policies depends on the product interface. For information about the DML operations supported by data sources, see Supported DML and DDL operations.
Set DDL message processing rules.
The source data source may involve many DDL operations. During real-time synchronization, you can set processing policies for different DDL messages sent to the destination based on your business needs. The supported DDL operations may vary between data sources. For more information, see Supported DML and DDL operations. On the page, you can set DDL processing policies for each destination database type. The following table describes the DDL message processing policies.
DDL Message Type
Processing Policy
CreateTable
When DataWorks receives a DDL message of the corresponding type, the processing policies are as follows:
Normal Processing: Forwards the message to the destination data source for processing. Because different destination data sources may have different policies for handling DDL messages, DataWorks only forwards the message.
Ignore: Discards the message and does not send it to the destination data source.
Alert: Discards the message and records an alert in the real-time synchronization log, indicating that the message was discarded due to an execution error.
Error: The real-time sync task immediately enters an error state and stops running.
DropTable
AddColumn
DropColumn
RenameTable
RenameColumn
ChangeColumn
TruncateTable
Set runtime resources.
The task concurrency control feature limits the maximum number of concurrent connections for reading from and writing to the database.
You can control whether the sync task tolerates dirty data.
If dirty data is not allowed, the task fails and exits if dirty data is generated during execution.
If dirty data is allowed, the task ignores the dirty data (it is not written to the destination) and continues to run normally.
Click Complete.
Step 4: Commit and publish the real-time sync task
In the toolbar, click the
icon to commit the node.In the Commit dialog box, enter a Change description.
Click Comfim.
If you are using a workspace in standard mode, you must publish the task to the production environment after you commit it. In the top menu bar, click Deploy on the left. For more information, see Publish tasks.
Step 5: Run the real-time sync task
You cannot run a real-time sync task directly in Data Studio. You must publish the task to the Operation Center, and then start and view the task's running status in the Operation Center.
After the task is configured, you can start and manage it on the page. For more information, see Real-time sync task O&M.
What to do next
After the task starts, you can click the task name to view its running details and perform task O&M and tuning.
FAQ
For answers to frequently asked questions about real-time sync tasks, see FAQ about real-time synchronization.
Appendix: Task migration
You can migrate a real-time integration task for a single table that is configured in DataStudio to the Data Integration page by clicking Migrate to Data Integration.
Currently, only the following real-time integration tasks are supported:
Real-time integration tasks for a single table from Kafka to MaxCompute.
Real-time integration tasks for a single table from Kafka to Hologres.
Double-click the real-time integration task for a single table that you want to migrate. On the task configuration page, click Migrate to Data Integration to migrate the task.

In the upper-left corner, click the
icon and choose . On the Synchronization Task page, you can view the successfully migrated real-time integration task for a single table in the task list.
After migration, you can perform O&M operations directly on the Data Integration primary site without navigating to the Operation Center. The task is no longer visible in the Operation Center. The migration does not affect saved task configurations or running tasks.
After migration, the original task is moved to the recycle bin in Data Studio. All subsequent editing and O&M actions can be performed only on the task list page of the Data Integration primary site.