DataWorks Data Integration provides single-table real-time synchronization tasks designed for low-latency, high-throughput data replication and transfer between different data sources. This feature uses an advanced real-time computing engine to capture real-time data changes, such as inserts, deletes, and updates, at the source and quickly apply them to the destination. This topic uses the synchronization of a single table from Kafka to MaxCompute as an example to show you how to configure a single-table real-time synchronization task.
Preparations
Prepare data sources
Create source and destination data sources. For more information about data source configuration, see Data Source Management.
Ensure that the data sources support real-time synchronization. For more information, see Supported data sources and synchronization solutions.
Some data sources, such as Hologres and Oracle, require you to enable logging. The method for enabling logs varies by data source. For more information, see Data Source List.
Resource group: Purchase and configure a Serverless resource group.
Network connectivity: Establish a network connection between the resource group and the data sources.
Accessing the feature
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.
Configure the task
1. Create a sync task
You can create a sync task in one of the following ways:
Method 1: On the sync task page, select a Source and a Destination, and then click Create Sync Task. In this example, Kafka is the source and MaxCompute is the destination. You can select the source and destination as needed.
Method 2: On the sync task page, if the task list is empty, click Create.

2. Configure basic information
Configure basic information, such as the task name, description, and owner.
Select a synchronization type. Data Integration displays the supported Synchronization Types based on the source and destination database types. In this topic, Single-table Real-time is selected.
Synchronization steps: Single-table real-time synchronization tasks support only incremental synchronization. The steps are typically Schema Migration and Incremental Synchronization. This process first initializes the source table schema to the destination. After the task starts, it automatically captures data changes from the source and writes them to the destination table.
If the source is Hologres, full synchronization is also supported. This process first fully synchronizes existing data to the destination table. Then, incremental data synchronization starts automatically.
For more information about supported data sources and synchronization solutions, see Supported data sources and synchronization solutions.
3. Configure network and resources
In this step, select the Resource Group for the sync task. Also select the Source Data Source and Destination Data Source. Then, test the network connectivity.
For a Serverless resource group, you can specify the maximum number of compute units (CUs) that a sync task can use. If your sync task fails due to an out-of-memory (OOM) error, increase the CU limit for the resource group.
If you have not created a data source, click Add Data Source to create one. For more information, see Data Source Configuration.
4. Configure the synchronization channel
1. Configure the source
At the top of the page, click the Kafka data source. Then, edit the Kafka Source Information.

In the Kafka Source Information section, select the topic to synchronize from the Kafka data source.
You can use the default values for other configurations or modify them as needed. For more information about the parameters, see the official Kafka documentation .
In the upper-right corner, click Data Sampling.
In the dialog box that appears, set the Start Time and Number Of Samples, and then click Start Sampling. This action samples data from the specified Kafka topic. You can preview the data in the topic. This preview provides input for the data preview and visualization configurations of subsequent data processing nodes.
In the Output Field Configuration section, select the fields to synchronize as needed.
By default, Kafka provides six fields.
Field Name
Description
__key__
The key of the Kafka record.
__value__
The value of the Kafka record.
__partition__
The partition number where the Kafka record is located. The partition number is an integer that starts from 0.
__headers__
The headers of the Kafka record.
__offset__
The offset of the Kafka record in its partition. The offset is an integer that starts from 0.
__timestamp__
The 13-digit UNIX timestamp in milliseconds for the Kafka record.
You can also perform more field transformations in subsequent data processing nodes.
2. Edit the data processing node
Click the
icon to add a data processing method. Five methods are available: Data Masking, String Replace, Data Filtering, JSON Parsing, and Edit and Assign Fields. You can arrange these methods in the desired order. At runtime, the data processing methods are executed in the specified order.

After you configure a data processing node, you can click Preview Data Output in the upper-right corner:
In the table below the input data, you can view the results from the previous Data Sampling step. Click Re-fetch Upstream Output to refresh the results.
If there is no output from the upstream node, you can also click Manually Construct Data to simulate the previous output.
Click Preview to view the output data from the upstream step after it is processed by the data processing component.

The data output preview and data processing features depend on the Data Sampling from the Kafka source. Before you process data, you must complete data sampling in the Kafka source settings.
3. Configure the destination
At the top of the page, click the MaxCompute data destination. Then, edit the MaxCompute destination information.

In the MaxCompute Destination Information section, select a Tunnel resource group. The default is "Public transport resources", which is the free quota provided by MaxCompute.
Specify whether to Auto-create Table or Use Existing Table for the destination table.
If you choose to auto-create a table, a table with the same name as the source table is created by default. You can manually change the destination table name.
If you choose to use an existing table, select the destination table from the drop-down list.
(Optional) Edit the table schema.
When you select Auto-create Table, click Edit Table Schema. In the dialog box that appears, edit the destination table schema. You can also click Regenerate Table Schema Based On Upstream Output Columns to automatically generate the table schema based on the output columns of the upstream node. You can select a column in the auto-generated schema and set it as the primary key.
Configure field mapping.
The system automatically maps upstream columns to destination table columns based on the Same Name Mapping principle. You can adjust the mappings as needed. An upstream column can be mapped to multiple destination columns, but multiple upstream columns cannot be mapped to a single destination column. If an upstream column is not mapped to a destination column, its data is not written to the destination table.
For Kafka fields, you can configure custom JSON parsing. Use the data processing component to retrieve the content of the value field. This allows for more fine-grained field configuration.

(Optional) Configure partitions.
Automatic Time-based Partitioning creates partitions based on the business time (in this case, the _timestamp field). The first-level partition is by year, the second-level partition is by month, and so on.
Dynamic Partitioning by Field Content maps a field from the source table to a partition field in the destination MaxCompute table. This ensures that rows containing specific data in the source field are written to the corresponding partition in the MaxCompute table.
5. Other configurations
Alert configuration
To prevent data synchronization latency caused by task errors, you can set an alert policy for the single-table real-time synchronization task.
In the upper-right corner of the page, click Alert Settings to open the alert settings page for the task.
Click Add Alert to configure an alert rule. You can set alert triggers to monitor metrics such as data latency, failover events, task status, Data Definition Language (DDL) changes, and task resource utilization. You can set CRITICAL or WARNING alert levels based on specified thresholds.
Manage alert rules.
For existing alert rules, you can use the alert switch to enable or disable them. You can also send alerts to different personnel based on the alert level.
Advanced parameter configuration
The sync task provides advanced parameters for fine-grained configuration. The system provides default values, which you do not need to change in most cases. To modify them:
In the upper-right corner of the page, click Advanced Parameters to open the advanced parameter configuration page.
Set Auto-configure Runtime Settings to false.
Modify the parameter values based on the tooltips. The description for each parameter is displayed next to its name.
Modify these parameters only after you fully understand their purpose and potential consequences. Incorrect settings can cause unexpected errors or data quality issues.
Resource group configuration
In the upper-right corner of the page, you can click Resource Group Configuration to view and switch the resource group used by the current task.
6. Test run
After you complete all task configurations, click Test Run in the upper-right corner to debug the task. This simulates how the entire task processes a small amount of sample data. You can then preview the results that would be written to the destination table. If there are configuration errors, exceptions during the test run, or dirty data, the system provides real-time feedback. This helps you quickly assess the correctness of your task configuration and whether it produces the expected results.
In the dialog box that appears, set the sampling parameters (Start Time and Number Of Samples).
Click Start Sampling to retrieve the sample data.
Click Preview to simulate the task run and view the output.
The output of the test run is for preview only. It is not written to the destination data source and does not affect production data.
7. Start the task
After you complete all configurations, click Complete Configuration at the bottom of the page.
On the page, find the sync task that you created. In the Actions column, click Publish. If you select the Start Running Immediately After Publishing checkbox, the task runs immediately after it is published. Otherwise, you must manually start the task.
NoteData Integration tasks must be published to the production environment to run. Therefore, new or edited tasks take effect only after you perform the Publish operation.
In the Task List, click the Name/ID of the task to view its detailed execution process.
What to do next
After the task starts, you can click the task name to view its running details and perform task operations and maintenance (O&M) and tuning.
FAQ
For answers to frequently asked questions about real-time synchronization tasks, see Real-time synchronization FAQ.
More examples
Real-time synchronization of a single table from Kafka to ApsaraDB for OceanBase
Real-time ingestion of a single table from LogHub (SLS) to Data Lake Formation
Real-time synchronization of a single table from Hologres to Doris
Real-time synchronization of a single table from Hologres to Hologres
Real-time synchronization of a single table from Kafka to Hologres
Real-time synchronization of a single table from LogHub (SLS) to Hologres
Real-time synchronization of a single table from Kafka to Hologres
Real-time synchronization of a single table from Hologres to Kafka
Real-time synchronization of a single table from LogHub (SLS) to MaxCompute
Real-time synchronization of a single table from Kafka to an OSS data lake
Real-time synchronization of a single table from Kafka to StarRocks
Real-time synchronization of a single table from Oracle to Tablestore