This guide walks you through configuring a Data Integration task that continuously synchronizes an entire MySQL database to LogHub (SLS) in real time using incremental (CDC) mode. The task reads binlog data from MySQL as changes occur and streams them to SLS as log entries — capturing inserts, updates, deletes, and optionally DDL operations, without requiring a full database scan after the initial setup.
What you'll do:
-
Select the sync task type (MySQL to LogHub, real-time migration of entire database)
-
Configure network and resource settings
-
Select the source databases and tables to synchronize
-
Map source tables to destination Logstores
-
Configure alert rules
-
Configure advanced parameters (data expansion format, DDL passthrough)
-
Set the resource group
-
Deploy and run the task
Prerequisites
Before you begin, ensure that you have:
-
A serverless resource group. See Serverless resource groups
-
A MySQL data source and a LogHub (SLS) data source configured in Data Integration. See Configure a data source
-
Network connectivity established between the resource group and both data sources. See Overview of network connection solutions
Step 1: Select the sync task type
-
Log on to the DataWorks console. In the top navigation bar, select the region. In the left-side navigation pane, choose Data Integration > Data Integration. Select the workspace from the drop-down list and click Go to Data Integration.
-
In the left-side navigation pane, click Synchronization Task, then click Create Synchronization Task. In the dialog box, set the following parameters:
Parameter Value Source Type MySQLDestination Type LogHubSpecific Type Real-time migration of entire databaseSynchronization Mode Incremental synchronization — continuously reads binlog data from the source database and writes it to LogHub (SLS)
Step 2: Configure network and resource settings
-
In the Network and Resource Configuration section, select a Resource Group for the sync task and specify the compute units (CUs) for Task Resource Usage.
-
Under Source Information, select your MySQL data source. Under Destination, select your LogHub data source. Click Test Connectivity for both.

-
After both connectivity tests pass, click Next.
Step 3: Select source databases and tables
In the Source Table area, select the tables to synchronize. Click the
icon to move them to the Selected Tables area.
Two selection methods are available:
-
Select specific databases and tables: Use the Database Filter and Table Filter fields to search. Click the
icon to add selections to Selected Databases/Tables. To exclude items, find them in Selected Databases/Tables and click the
icon to move them back. -
Use regular expressions: Enter regular expressions in Database Filter and Table Filter, then click Confirm. This method supports adding and removing tables dynamically at runtime.
For example, enter
a.*in Database Filter to match databases prefixed witha, andorder.*in Table Filter to match tables prefixed withorder.
Step 4: Map source tables to destination Logstores
After selecting tables, the destination properties are in a pending-mapping state. Define the mappings manually before the task can run.
Select the destination Logstore
From the drop-down in the Destination Logstore column, select a Logstore for each source table. To configure multiple tables at once, select them and click Batch Modify > Destination Logstore.
You can click Refresh in the Actions column to refresh mappings between source tables and destination tables. You can do this directly, or after you configure settings related to destination tables.
Configure DML rules
Data Integration provides a default Data Manipulation Language (DML) processing rule. Customize rules for specific tables based on your business needs.
-
For a single table: click Configuration in the DML Rule Configuration column for that table.
-
For multiple tables: select the tables and click Batch Modify > DML Rule Configuration.
Step 5: Configure alert rules
For a continuously running real-time sync task, configuring alerts is important — task failures can cause data synchronization delays that are difficult to detect without monitoring.
-
In the upper-right corner, click Configure Alert Rule to open the Alert Rule Configurations for Real-time Synchronization Subnode panel.
-
Click Add Alert Rule and configure the parameters.
-
Enable or disable alert rules as needed, and assign different alert recipients based on severity level.
Alert rules configured here apply to the real-time synchronization subtask generated by this sync task. After the task is fully configured and deployed, you can also manage alert rules from the Real-time Synchronization Task page. See Run and manage real-time synchronization tasks.
Step 6: Configure advanced parameters
Click Configure Advanced Parameters in the upper-right corner to open the advanced settings. These parameters control how binlog data is structured when written to SLS.
| Parameter | Description | Default |
|---|---|---|
| Whether to expand the collected data | Controls whether business data fields from the source table are flattened to the top level of the log. Set to True to expand; set to False to encapsulate all business data in a single JSON field. |
False |
| Whether to pass through DDL information | Specifies whether DDL operations (such as CREATE TABLE and ALTER TABLE) are captured from the source and written to SLS as log entries. |
True |
| Null value handling strategy from source | The value written to the destination field when the source field is NULL. Leave blank to retain the NULL value. | Blank (NULL retained) |
| Data Expansion Format | Defines how business data is structured when written to SLS. Only visible when Whether to expand the collected data is set to True. The format you choose directly affects how downstream systems consume the logs. |
Partial expansion |
Data Expansion Format options:
-
Partial expansion (default): Writes pre-change data to a top-level
old_dataobject and post-change data to a top-leveldataobject. Compatible with the Logtail format for MySQL binlog collection. -
Full expansion: Flattens all fields — including both pre-change and post-change data — into independent top-level key-value pairs.
For examples of all three output structures (no expansion, partial expansion, and full expansion), see Appendix: Data expansion format examples.
Step 7: Configure the resource group
Click Configure Resource Group in the upper-right corner to view or change the resource groups assigned to this sync task.
Step 8: Deploy and run the task
-
Click Save to save the task configuration.
-
On the Data Integration > Synchronization Task page, find the task and click Deploy in the Operation column. During deployment, select Start Immediately After Deployment to run the task as soon as deployment completes. Otherwise, start it manually after deployment.
Data integration tasks must be deployed to the production environment to take effect. Always deploy after creating or editing a task.
-
Click the Name/ID of the task to view its detailed execution progress.
Manage the sync task
View task status
After creating the task, view its status on the Synchronization Task page.
-
In the Actions column, click Start or Stop to control the task. Click More for additional options such as Edit and View.
-
For a running task, check the Execution Overview for real-time statistics including progress, DDL records, DML records, and alert information. Click any area of the overview to drill into details.
Rerun the synchronization task
Use the following table to determine the correct rerun method for your scenario:
| Scenario | Action |
|---|---|
| Rerun without any configuration changes — performs full synchronization and incremental synchronization again | Click More > Rerun in the Actions column |
| Tables were added to or removed from the source after the task was configured, or the schema or name of a destination table was changed | Click More > Rerun — syncs data only from newly added tables or only from the mapped source table to the destination table whose schema or name changed |
| Tables were added to or removed from the task configuration | Click Complete, then click Apply Updates in the Actions column — syncs data from newly added tables only; original tables are not re-synchronized |
Appendix: Data expansion format examples
All three examples show the output for a row_update event. The common metadata fields — ExecutionTime, _db_, _event_, _event_time_, _file_name_, _gtid_, _host_, _id_, _offset_, and _table_ — appear in every format.
No expansion (default)
When Whether to expand the collected data is False, the business data is encapsulated in a single data field containing updateBefore and updateAfter sub-objects.
{
"ExecutionTime": 1761017850000,
"_db_": "*****",
"_event_": "row_update",
"_event_time_": 1761017850,
"_file_name_": "mysql-bin.*****",
"_gtid_": "4a21a3ce-ad7a-11f0-a8f3**********",
"_host_": "rm-*********.mysql.rds.aliyuncs.com",
"_id_": "176101777********",
"_offset_": "265*****",
"_table_": "t_parameter",
"data": {
"updateBefore": {
"_old_id": "3",
"_old_name": "82174b93-b810-4030-8652-e5c1667d3f72",
"_old_value": "+@}8-/XC",
"_old_status": "kBdO",
"_old_description": "a?!L7{jaH+",
"_old_create_time": "2023-12-28 19:03:43",
"_old_create_user": "+Zs",
"_old_modify_time": "2006-11-26 20:42:31",
"_old_modify_user": "brTYGI?jLL"
},
"updateAfter": {
"id": "3",
"name": "82174b93-b810-4030-8652-e5c1667d3f72-tagd",
"value": "+@}8-/XC",
"status": "kBdO",
"description": "a?!L7{jaH+",
"create_time": "2023-12-28 19:03:43",
"create_user": "+Zs",
"modify_time": "2006-11-26 20:42:31",
"modify_user": "brTYGI?jLL"
}
}
}
Partial expansion (default when expansion is enabled)
Pre-change data is in a top-level old_data object; post-change data is in a top-level data object. This format is compatible with the Logtail format for MySQL binlog collection.
{
"ExecutionTime": 1761017850000,
"_db_": "*****",
"_event_": "row_update",
"_event_time_": 1761017850,
"_file_name_": "mysql-bin.*****",
"_gtid_": "4a21a3ce-ad7a-11f0-a8f3**********",
"_host_": "rm-*********.mysql.rds.aliyuncs.com",
"_id_": "176101777********",
"_offset_": "265*****",
"_table_": "t_parameter",
"old_data": {
"_old_id": "1",
"_old_name": "0e459c1a-c6ce-459b-b374-a161b095c8e9",
"_old_value": "Hello",
"_old_status": "b",
"_old_description": "cw",
"_old_create_time": "2007-08-06 16:19:03",
"_old_create_user": "!wW4",
"_old_modify_time": "2017-04-21 18:21:58",
"_old_modify_user": "s"
},
"data": {
"id": "1",
"name": "0e459c1a-c6ce-459b-b374-a161b095c8e9-dsg",
"value": "Hello",
"status": "b",
"description": "cw",
"create_time": "2007-08-06 16:19:03",
"create_user": "!wW4",
"modify_time": "2017-04-21 18:21:58",
"modify_user": "s"
}
}
Full expansion
All fields — including pre-change and post-change data — are flattened to the top level. Pre-change field names are prefixed with _old_.
{
"ExecutionTime": 1761017850000,
"_db_": "****",
"_event_": "row_update",
"_event_time_": 1761017850,
"_file_name_": "mysql-bin.*****",
"_gtid_": "4a21a3ce-ad7a-11f0-a8f3**********",
"_host_": "rm-*********.mysql.rds.aliyuncs.com",
"_id_": "176101777********",
"_offset_": "265*****",
"_table_": "t_parameter",
"_old_create_time": "2024-09-27 15:27:10",
"_old_create_user": "o",
"_old_description": "LZ[1HsTE",
"_old_id": "6",
"_old_modify_time": "2008-03-15 08:05:53",
"_old_modify_user": "/{=>7_d@0Q",
"_old_name": "cf8a671c-4414-45f5-a22c-62c353a6f1ef",
"_old_status": "K:HQOX-?gK",
"_old_value": "23]sn<t",
"create_time": "2024-09-27 15:27:10",
"create_user": "o",
"description": "LZ[1HsTE",
"id": "6",
"modify_time": "2008-03-15 08:05:53",
"modify_user": "/{=>7_d@0Q",
"name": "cf8a671c-4414-45f5-a22c-62c353a6f1efgsdsa",
"status": "K:HQOX-?gK",
"value": "23]sn<t"
}