DataWorks Data Integration provides single-table real-time sync tasks to enable low-latency, high-throughput data replication and transfer between different data sources. This feature uses an advanced Real-time Compute Engine to capture real-time data changes (inserts, updates, and deletes) from the source and quickly apply them to the destination. This topic uses the synchronization of data from Kafka to MaxCompute as an example to demonstrate how to configure a single-table real-time sync task.
Prerequisites
Data source preparation
You have created the source and destination data sources. For more information, see Data Source Management.
Ensure that the data sources support real-time synchronization. For more information, see Supported data sources and synchronization solutions.
Some data sources require logging to be enabled. The method varies by data source. For details, see the configuration guide for each source in Data source list.
Resource Group: You have purchased and configured a Serverless Resource Group.
Network Connectivity: You have established the network connection between the resource group and the data sources.
Step 1: Create a sync task
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.
In the left-side navigation pane, click Sync Tasks. On the page that appears, click Create Sync Task and configure the task. This topic uses the real-time synchronization of data from Kafka to MaxCompute as an example:
Source:
Kafka.Destination:
MaxCompute.Synchronization Type:
Single-table Real-time.Synchronization Steps:
Schema Migration: Automatically creates database objects (such as tables, fields, and data types) in the destination that match the source. This step does not include data.
Incremental Synchronization (Optional): After a full synchronization is complete, this step continuously captures data changes (inserts, updates, and deletes) from the source and synchronizes them to the destination.
If the source is Hologres, Full Synchronization is also supported. The task first synchronizes all existing data to the destination table, and then automatically starts Incremental Synchronization.
For more information about supported data sources and synchronization solutions, see Supported data sources and synchronization solutions.
Step 2: Configure data sources and runtime resources
For Source Data Source, select your
Kafkadata source. For Destination Data Source, select yourMaxComputedata source.In the Runtime Resources section, select the Resource Group for the sync task and allocate CUs for the task. You can set CUs separately for Full Synchronization and Incremental Synchronization to precisely control resources and prevent waste. If your sync task encounters an out-of-memory (OOM) error due to insufficient resources, increase the CU allocation for the task.
Ensure that both the source and destination data sources pass the Network Connectivity check.
Step 3: Configure the sync solution
1. Configure the source
On the Configuration tab, select the Kafka topic that you want to synchronize.
Use the default settings or modify them as needed. For more information about the parameters, see the official Kafka documentation.
In the upper-right corner, click Data Sampling.
In the dialog box that appears, specify the Start time and Sampled Data Records, and then click Start Collection. This samples data from the specified Kafka topic. You can preview the data in the topic, which provides input for the data preview and visual configuration in subsequent data processing nodes.
On the Configure Output Field tab, select the fields that you want to synchronize.
Kafka provides six default fields.
Field name
Description
__key__
The key of the Kafka record.
__value__
The value of the Kafka record.
__partition__
The partition number where the Kafka record is stored. Partition numbers are integers that start from 0.
__headers__
The headers of the Kafka record.
__offset__
The offset of the Kafka record in its partition. Offsets are integers that start from 0.
__timestamp__
The 13-digit integer millisecond timestamp of the Kafka record.
You can also perform more field transformations in subsequent data processing steps.
2. Process data
Turn on Data Processing. Five processing methods are available: Data Masking, String Replace, Data Filtering, JSON Parsing, and Edit and assign fields. Arrange these methods in your desired order of execution.

After you configure a data processing step, you can click Preview Data Output in the upper-right corner:
The table under the input data shows the results from the previous Data Sampling step. Click Re-obtain Output of Ancestor Node to refresh the results.
If there is no output from the upstream step, you can use Manually Construct Data to simulate the upstream output.
Click Preview to view the output data from the upstream step after it has been processed by the data processing component.

The data output preview and data processing features rely on the Data Sampling results from the Kafka source. Before configuring data processing, first perform data sampling on the Kafka source configuration page.
3. Configure the destination
In the Destination section, select a Tunnel resource group. By default, Public Transfer Resource is selected, which uses the free quota of MaxCompute.
Choose whether to write data to a new table or an existing table.
If you choose to create a new table, select Create from the drop-down list. By default, a table with the same schema as the data source is created. You can manually change the destination table name and schema.
If you choose to use an existing table, select the target table from the drop-down list.
(Optional) Edit the Table Schema.
Click the edit icon next to the table name to edit the Table Schema. You can click Re-generate Table Schema Based on Output Column of Ancestor Node to automatically generate a Table Schema based on the output columns from the upstream node. You can select a column in the auto-generated schema to be the primary key.
4. Configure field mapping
After you select the source and destination, you must specify the mapping between source fields and destination columns. The task writes data from source fields to the corresponding destination columns based on the configured field mapping.
The system automatically generates a mapping between the upstream fields and the destination table columns based on the Same Name Mapping principle. You can adjust the mapping as needed. You can map one upstream field to multiple destination table columns, but you cannot map multiple upstream fields to a single destination table column. If an upstream field is not mapped to a destination table column, its data is not written to the destination table.
You can configure custom JSON Parsing for Kafka fields. Use the data processing component to retrieve the content of the value field for more detailed field configuration.

(Optional) Configure partitions.
Automatic Time-based Partitioning partitions data based on a business time field (in this example,
_timestamp). The first-level partition is by year, the second is by month, and so on.Dynamic Partitioning by Field Content maps a field from the source table to a partition field in the destination MaxCompute table. This ensures that rows containing specific field values are written to the corresponding partitions in the MaxCompute table.
Step 4: Configure advanced settings
Synchronization tasks provide advanced parameters for fine-grained control. In most cases, you do not need to change the default values. If necessary, you can:
Click Advanced Parameters in the upper-right corner to go to the Advanced Parameters configuration page.
NoteThe advanced parameters are on a tab on the right side of the task configuration page.
You can set parameters separately for the reader and writer of the sync task. To customize the Runtime Configuration, disable Auto-configure Runtime Settings.
Modify parameter values as described in the tooltips. For parameter descriptions, refer to the explanations next to the parameter names. For configuration recommendations for some parameters, see Advanced parameters for real-time synchronization.
Modify these parameters only if you fully understand their purpose and potential impact. Incorrect settings can lead to unexpected errors or data quality issues.
Step 5: Test run
After configuring the task, you can click Perform Simulated Running in the lower-left corner to debug the task. A Test Run simulates the full task flow on a small set of sample data and previews the results as they would be written to the destination table. If there are configuration errors, exceptions, or Dirty Data during the Test Run, the system provides real-time feedback. This helps you quickly verify that the task is configured correctly and produces the expected results.
In the dialog box that appears, set the sampling parameters: Start time and Sampled Data Records.
Click Start Collection to get the sample data.
Click Preview Result to simulate the task run and view the output results.
The output of a Test Run is for preview only and is not written to the destination data source. It does not affect your production data.
Step 6: Publish and run the task
After you complete all configurations, click Complete Configuration at the bottom of the page.
Data Integration tasks must be published to the production environment to run. After creating or editing a task, click Deploy to apply your changes. When publishing, you can select an option to start the task immediately. If you do not select this option, you must go to the page after publishing and manually start the task.
In the Tasks, click the Name/ID of the task to view the detailed execution process.
Step 7: Configure alert rules
After the task is published and running, you can configure Alert Rules to be notified immediately of any exceptions. This helps keep your production environment stable and your data up to date. On the Sync Tasks page, click in the Actions column for the target task.
1. Add an alert

(1) Click Create Rule to configure the Alert Rule.
You can set Alert Reason to monitor metrics such as Business delay, failover, Task status, DDL Notification, and Task Resource Utilization. You can set CRITICAL or WARNING alert levels based on specified thresholds.
After setting the alert method, you can use Configure Advanced Parameters to control the interval for sending alert messages. This prevents sending too many messages at once, which can cause waste and message backlog.
If you select Business delay, Task status, or Task Resource Utilization as the alert trigger, you can also enable recovery notifications to inform recipients when the task returns to normal.
(2) Manage Alert Rules.
For existing Alert Rules, you can use the alert switch to enable or disable them. You can also notify different contacts based on the alert level.
2. View alerts
Click for the task to view a history of alert events.
Next steps
After the task starts, you can click the task name to view its run details and perform Task Operations and Maintenance (O&M) and Tuning.
FAQ
For common issues related to real-time sync tasks, see Real-time synchronization FAQ.
More examples
Real-time single-table synchronization from Kafka to ApsaraDB for OceanBase
Real-time single-table ingestion from LogHub (SLS) to Data Lake Formation
Real-time single-table synchronization from Hologres to Doris
Real-time single-table synchronization from Hologres to Hologres
Real-time single-table synchronization from Kafka to Hologres
Real-time single-table synchronization from LogHub (SLS) to Hologres
Real-time single-table synchronization from Kafka to Hologres
Real-time single-table synchronization from Hologres to Kafka
Real-time single-table synchronization from LogHub (SLS) to MaxCompute
Real-time single-table synchronization from Kafka to an OSS data lake
Real-time single-table synchronization from Kafka to StarRocks
Real-time single-table synchronization from Oracle to Tablestore