Real-Time MySQL Database Sync to LogHub SLS for Streaming Analytics - DataWorks

This guide walks you through configuring a Data Integration task that continuously synchronizes an entire MySQL database to LogHub (SLS) in real time using incremental (CDC) mode. The task reads binlog data from MySQL as changes occur and streams them to SLS as log entries — capturing inserts, updates, deletes, and optionally DDL operations, without requiring a full database scan after the initial setup.

What you'll do:

Select the sync task type (MySQL to LogHub, real-time migration of entire database)
Configure network and resource settings
Select the source databases and tables to synchronize
Map source tables to destination Logstores
Configure alert rules
Configure advanced parameters (data expansion format, DDL passthrough)
Set the resource group
Deploy and run the task

Prerequisites

Before you begin, ensure that you have:

A serverless resource group. See Serverless resource groups
A MySQL data source and a LogHub (SLS) data source configured in Data Integration. See Configure a data source
Network connectivity established between the resource group and both data sources. See Overview of network connection solutions

Step 1: Select the sync task type

Log on to the DataWorks console. In the top navigation bar, select the region. In the left-side navigation pane, choose Data Integration > Data Integration. Select the workspace from the drop-down list and click Go to Data Integration.

In the left-side navigation pane, click Synchronization Task, then click Create Synchronization Task. In the dialog box, set the following parameters:

Parameter	Value
Source Type	`MySQL`
Destination Type	`LogHub`
Specific Type	`Real-time migration of entire database`
Synchronization Mode	Incremental synchronization — continuously reads binlog data from the source database and writes it to LogHub (SLS)

Step 2: Configure network and resource settings

In the Network and Resource Configuration section, select a Resource Group for the sync task and specify the compute units (CUs) for Task Resource Usage.
Under Source Information, select your MySQL data source. Under Destination, select your LogHub data source. Click Test Connectivity for both.
After both connectivity tests pass, click Next.

Step 3: Select source databases and tables

In the Source Table area, select the tables to synchronize. Click the icon to move them to the Selected Tables area.

Two selection methods are available:

Select specific databases and tables: Use the Database Filter and Table Filter fields to search. Click the icon to add selections to Selected Databases/Tables. To exclude items, find them in Selected Databases/Tables and click the icon to move them back.
Use regular expressions: Enter regular expressions in Database Filter and Table Filter, then click Confirm. This method supports adding and removing tables dynamically at runtime.

For example, enter a.* in Database Filter to match databases prefixed with a, and order.* in Table Filter to match tables prefixed with order.

Step 4: Map source tables to destination Logstores

After selecting tables, the destination properties are in a pending-mapping state. Define the mappings manually before the task can run.

Select the destination Logstore

From the drop-down in the Destination Logstore column, select a Logstore for each source table. To configure multiple tables at once, select them and click Batch Modify > Destination Logstore.

You can click Refresh in the Actions column to refresh mappings between source tables and destination tables. You can do this directly, or after you configure settings related to destination tables.

Configure DML rules

Data Integration provides a default Data Manipulation Language (DML) processing rule. Customize rules for specific tables based on your business needs.

For a single table: click Configuration in the DML Rule Configuration column for that table.
For multiple tables: select the tables and click Batch Modify > DML Rule Configuration.

Step 5: Configure alert rules

For a continuously running real-time sync task, configuring alerts is important — task failures can cause data synchronization delays that are difficult to detect without monitoring.

In the upper-right corner, click Configure Alert Rule to open the Alert Rule Configurations for Real-time Synchronization Subnode panel.
Click Add Alert Rule and configure the parameters.
Enable or disable alert rules as needed, and assign different alert recipients based on severity level.

Alert rules configured here apply to the real-time synchronization subtask generated by this sync task. After the task is fully configured and deployed, you can also manage alert rules from the Real-time Synchronization Task page. See Run and manage real-time synchronization tasks.

Step 6: Configure advanced parameters

Click Configure Advanced Parameters in the upper-right corner to open the advanced settings. These parameters control how binlog data is structured when written to SLS.

Parameter	Description	Default
Whether to expand the collected data	Controls whether business data fields from the source table are flattened to the top level of the log. Set to `True` to expand; set to `False` to encapsulate all business data in a single JSON field.	`False`
Whether to pass through DDL information	Specifies whether DDL operations (such as `CREATE TABLE` and `ALTER TABLE`) are captured from the source and written to SLS as log entries.	`True`
Null value handling strategy from source	The value written to the destination field when the source field is NULL. Leave blank to retain the NULL value.	Blank (NULL retained)
Data Expansion Format	Defines how business data is structured when written to SLS. Only visible when Whether to expand the collected data is set to `True`. The format you choose directly affects how downstream systems consume the logs.	Partial expansion

Data Expansion Format options:

Partial expansion (default): Writes pre-change data to a top-level old_data object and post-change data to a top-level data object. Compatible with the Logtail format for MySQL binlog collection.
Full expansion: Flattens all fields — including both pre-change and post-change data — into independent top-level key-value pairs.

For examples of all three output structures (no expansion, partial expansion, and full expansion), see Appendix: Data expansion format examples.

Step 7: Configure the resource group

Click Configure Resource Group in the upper-right corner to view or change the resource groups assigned to this sync task.

Step 8: Deploy and run the task

Click Save to save the task configuration.
On the Data Integration > Synchronization Task page, find the task and click Deploy in the Operation column. During deployment, select Start Immediately After Deployment to run the task as soon as deployment completes. Otherwise, start it manually after deployment.

Data integration tasks must be deployed to the production environment to take effect. Always deploy after creating or editing a task.
Click the Name/ID of the task to view its detailed execution progress.

Manage the sync task

View task status

After creating the task, view its status on the Synchronization Task page.

In the Actions column, click Start or Stop to control the task. Click More for additional options such as Edit and View.
For a running task, check the Execution Overview for real-time statistics including progress, DDL records, DML records, and alert information. Click any area of the overview to drill into details.

Rerun the synchronization task

Use the following table to determine the correct rerun method for your scenario:

Scenario	Action
Rerun without any configuration changes — performs full synchronization and incremental synchronization again	Click More > Rerun in the Actions column
Tables were added to or removed from the source after the task was configured, or the schema or name of a destination table was changed	Click More > Rerun — syncs data only from newly added tables or only from the mapped source table to the destination table whose schema or name changed
Tables were added to or removed from the task configuration	Click Complete, then click Apply Updates in the Actions column — syncs data from newly added tables only; original tables are not re-synchronized

Appendix: Data expansion format examples

All three examples show the output for a row_update event. The common metadata fields — ExecutionTime, _db_, _event_, _event_time_, _file_name_, _gtid_, _host_, _id_, _offset_, and _table_ — appear in every format.

No expansion (default)

When Whether to expand the collected data is False, the business data is encapsulated in a single data field containing updateBefore and updateAfter sub-objects.

{
  "ExecutionTime": 1761017850000,
  "_db_": "*****",
  "_event_": "row_update",
  "_event_time_": 1761017850,
  "_file_name_": "mysql-bin.*****",
  "_gtid_": "4a21a3ce-ad7a-11f0-a8f3**********",
  "_host_": "rm-*********.mysql.rds.aliyuncs.com",
  "_id_": "176101777********",
  "_offset_": "265*****",
  "_table_": "t_parameter",
  "data": {
    "updateBefore": {
      "_old_id": "3",
      "_old_name": "82174b93-b810-4030-8652-e5c1667d3f72",
      "_old_value": "+@}8-/XC",
      "_old_status": "kBdO",
      "_old_description": "a?!L7{jaH+",
      "_old_create_time": "2023-12-28 19:03:43",
      "_old_create_user": "+Zs",
      "_old_modify_time": "2006-11-26 20:42:31",
      "_old_modify_user": "brTYGI?jLL"
    },
    "updateAfter": {
      "id": "3",
      "name": "82174b93-b810-4030-8652-e5c1667d3f72-tagd",
      "value": "+@}8-/XC",
      "status": "kBdO",
      "description": "a?!L7{jaH+",
      "create_time": "2023-12-28 19:03:43",
      "create_user": "+Zs",
      "modify_time": "2006-11-26 20:42:31",
      "modify_user": "brTYGI?jLL"
    }
  }
}

Partial expansion (default when expansion is enabled)

Pre-change data is in a top-level old_data object; post-change data is in a top-level data object. This format is compatible with the Logtail format for MySQL binlog collection.

{
  "ExecutionTime": 1761017850000,
  "_db_": "*****",
  "_event_": "row_update",
  "_event_time_": 1761017850,
  "_file_name_": "mysql-bin.*****",
  "_gtid_": "4a21a3ce-ad7a-11f0-a8f3**********",
  "_host_": "rm-*********.mysql.rds.aliyuncs.com",
  "_id_": "176101777********",
  "_offset_": "265*****",
  "_table_": "t_parameter",
  "old_data": {
    "_old_id": "1",
    "_old_name": "0e459c1a-c6ce-459b-b374-a161b095c8e9",
    "_old_value": "Hello",
    "_old_status": "b",
    "_old_description": "cw",
    "_old_create_time": "2007-08-06 16:19:03",
    "_old_create_user": "!wW4",
    "_old_modify_time": "2017-04-21 18:21:58",
    "_old_modify_user": "s"
  },
  "data": {
    "id": "1",
    "name": "0e459c1a-c6ce-459b-b374-a161b095c8e9-dsg",
    "value": "Hello",
    "status": "b",
    "description": "cw",
    "create_time": "2007-08-06 16:19:03",
    "create_user": "!wW4",
    "modify_time": "2017-04-21 18:21:58",
    "modify_user": "s"
  }
}

Full expansion

All fields — including pre-change and post-change data — are flattened to the top level. Pre-change field names are prefixed with _old_.

{
  "ExecutionTime": 1761017850000,
  "_db_": "****",
  "_event_": "row_update",
  "_event_time_": 1761017850,
  "_file_name_": "mysql-bin.*****",
  "_gtid_": "4a21a3ce-ad7a-11f0-a8f3**********",
  "_host_": "rm-*********.mysql.rds.aliyuncs.com",
  "_id_": "176101777********",
  "_offset_": "265*****",
  "_table_": "t_parameter",
  "_old_create_time": "2024-09-27 15:27:10",
  "_old_create_user": "o",
  "_old_description": "LZ[1HsTE",
  "_old_id": "6",
  "_old_modify_time": "2008-03-15 08:05:53",
  "_old_modify_user": "/{=>7_d@0Q",
  "_old_name": "cf8a671c-4414-45f5-a22c-62c353a6f1ef",
  "_old_status": "K:HQOX-?gK",
  "_old_value": "23]sn<t",
  "create_time": "2024-09-27 15:27:10",
  "create_user": "o",
  "description": "LZ[1HsTE",
  "id": "6",
  "modify_time": "2008-03-15 08:05:53",
  "modify_user": "/{=>7_d@0Q",
  "name": "cf8a671c-4414-45f5-a22c-62c353a6f1efgsdsa",
  "status": "K:HQOX-?gK",
  "value": "23]sn<t"
}