All Products
Search
Document Center

DataWorks:Configure a real-time full-database synchronization task

Last Updated:Mar 26, 2026

The whole database real-time synchronization feature combines a one-time full synchronization with continuous incremental capture to synchronize an entire source database, such as MySQL or Oracle, to a destination system with low latency. A whole database real-time synchronization task first performs a full synchronization of historical data, automatically initializing the schema and data in the destination. It then seamlessly switches to a real-time incremental mode, using technologies like change data capture (CDC) to continuously capture and synchronize subsequent data changes. This capability is ideal for use cases like real-time data warehousing and data lake construction. This topic uses synchronizing a MySQL database to MaxCompute as an example to explain how to configure the task.

Prerequisites

  • Data source preparation

    • You have created the source and destination data sources. For more information, see Data source management.

    • Ensure the data sources support whole database real-time synchronization. For more information, see Supported data sources and synchronization solutions.

    • Some data sources, such as MySQL, Hologres, and Oracle, require you to enable logging. The method for enabling logs varies by data source. For more information, see Data source list.

    • MaxCompute: The Decimal data type is supported only in MaxCompute 2.0. Before starting synchronization, enable MaxCompute 2.0 data types. For more information, see MaxCompute 2.0 data types.

  • Resource group: Purchase and configure a serverless resource group.

  • Network connectivity: Configure network connectivity between the resource group and the data sources.

Usage notes

  • DataWorks supports two types of database synchronization: whole database real-time synchronization and whole database full & incremental (near real-time). Both types perform a full synchronization of historical data and then automatically switch to an incremental mode. However, they differ in latency and destination table requirements:

    • Latency: Whole database real-time synchronization achieves second-to-minute latency. Whole database full & incremental (near real-time) provides T+1 latency.

    • Destination table (MaxCompute):

      • PK Delta Table: Supports all features of whole database real-time synchronization.

      • Standard Table and Append Delta Table: Support only the Append mode when you select the incremental-only synchronization mode for a whole database real-time synchronization task.

      • Whole database full & incremental (near real-time): Supports all preceding table types.

  • You can configure a whole database real-time synchronization task in either DataStudio or Data Integration. The functionality is identical in both modules.

    • Consistent configuration: The configuration interface, parameter settings, and underlying functions are exactly the same, regardless of where you create the task.

    • Auto-sync between modules: Tasks created in the Data Integration module are automatically synchronized to and displayed in the data_integration_jobs directory of the Data Studio module. These tasks are categorized by channel based on the source type-destination type format to facilitate centralized management.

Configure the task

Step 1: Create a synchronization task

  1. Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Integration > Data Integration. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.

  2. In the left-side navigation pane, click Synchronization Task. Then, click Create Synchronization Task at the top of the page and configure the following task information:

    • Source Type: MySQL.

    • Destination Type: MaxCompute.

    • Specific Type: Real-time migration of entire database.

    • Synchronization Mode:

      • Schema Migration: Automatically creates database objects such as tables, fields, and data types in the destination that match the source, but without the data.

      • Full Synchronization (optional): Performs a one-time, complete copy of all historical data from specified source objects, such as tables, to the destination. This is typically used for initial data migration or initialization.

      • Incremental Sync (optional): After the full synchronization is complete, this step continuously captures and synchronizes data changes (inserts, updates, and deletes) from the source to the destination.

Step 2: Configure data sources and resources

  1. In the Source Data Source section, select your MySQL data source. In the Destination section, select your MaxCompute data source.

  2. In the Running Resources section, select the Resource Group for the synchronization task and allocate CU for the task.

    Note

    When the task log shows a message about insufficient resources, such as Please confirm whether there are enough resources..., it means that the available Computing Units (CUs) in the current resource group are insufficient to start or run the task. You can increase the number of CUs that the task uses in the Configure Resource Group panel to allocate more computing resources.

    For recommended resource settings, see Recommended CUs for Data Integration. Adjust the settings based on your actual workload.

  3. Ensure that both the source and destination data sources pass the Connectivity Check.

Step 3: Configure the synchronization solution

1. Configure the data source

  • In this step, select the tables to synchronize from the Source Tables list and click the image icon to move them to the Selected Tables list. If you have many tables, you can use Database Filtering or Table filtering to select tables by configuring regular expressions.

    image

  • To write data from multiple sharded databases and tables with the same structure into a single destination table, you can use regex for table selection.

    image
    Enter a regular expression in the source table configuration. DataWorks automatically identifies all matching source tables and writes their data to the destination table that corresponds to the expression.

    Note

    This method is suitable for merging and synchronizing sharded tables, which improves configuration efficiency by eliminating the need to add multiple many-to-one synchronization rules.

2. Configure the data destination

If you select only Incremental Sync for the whole database real-time synchronization task, you can configure the incremental synchronization mode for writing to the destination table.

  • Replay: This mode is supported only for PK Delta Tables. Similar to normal synchronization, it only synchronizes data fields.

  • Incremental stream: This mode is supported for Standard Tables and Append Delta Tables. It appends real-time data from the source table, along with metadata such as inserts, updates, and deletes, to the destination table. For the incremental stream format, see Appendix: Incremental stream table format.

3. Map destination tables

In this step, you need to define mapping rules between source and destination tables and specify rules for primary keys, dynamic partitions, and DDL/DML configurations to determine how data is written.

Actions

Description

Refresh

The system automatically lists the source tables you selected, but the specific attributes of the destination tables take effect only after you refresh and confirm the mapping.

  • Select the tables to be synchronized in batches and click Bulk Refresh Mapping.

  • Destination Table Name: The destination table name is automatically generated based on the Customize Mapping Rules for Destination Table Names rule. The default is ${source_database_name}_${table_name}. If a table with this name does not exist in the destination, the system automatically creates one.

Customize Mapping Rules for Destination Table Names (optional)

The system has a default table name generation rule: ${source_database_name}_${table_name}. You can also click the Edit button in the Customize Mapping Rules for Destination Table Names column to add a custom rule for destination table names.

  • Rule name: Define a name for the rule. We recommend using a descriptive name.

  • Destination table name: You can generate the destination table name by clicking the image icon and selecting Manually enter or Built-in Variable to concatenate parts of the name. Variables include the source data source name, source database name, and source table name.

  • Edit built-in variable: You can perform string conversions on existing built-in variables.

This feature supports the following scenarios:

  1. Add prefixes or suffixes: Add a prefix or suffix to the source table name by setting a constant.

    Rule configuration

    Result

    image

    image

  2. Unified string replacement: Replace the string dev_ in the source table name with prd_.

    Rule configuration

    Result

    image

    image

  3. Write data from multiple tables to one table: Set the destination table name as a constant.

    Rule configuration

    Result

    image

    image

Edit field type mapping (optional)

The system provides default mappings between source and destination field types. You can click Edit Mapping of Field Data Types in the upper-right corner of the table to customize the mapping relationship. After configuration, click Apply and Refresh Mapping.

When editing the field type mapping, ensure that the type conversion rules are correct. Otherwise, type conversion may fail, leading to dirty data and task interruptions.

Edit destination table structure (optional)

Based on the custom table name mapping rules, the system creates new destination tables or reuses existing ones with the same name.

DataWorks generates the destination schema based on the source schema. Manual intervention is usually not required. You can also modify the table schema in the following ways:

  • Add a field to a single table: Click the image.png button in the Target Table column to add a field.

  • Add fields in batches: Select all tables to be synchronized and choose Batch Edit > Destination Table Schema - Batch Modify and Add Field at the bottom of the table.

  • Renaming columns is not supported.

For existing tables, you can only add fields. For new tables, you can add fields, partition fields, and set table types or properties. For more details, refer to the editable areas in the UI.

Value assignment

Native fields are automatically mapped based on fields with the same name in the source and destination tables. You must manually assign values for newly added fields and partition fields from the previous steps. Perform the following operations:

  • Assign values to a single table: Click the Configuration button in the Value assignment column to assign values to the destination table fields.

  • Assign values in batches: At the bottom of the list, choose Batch Edit > Value assignment to assign values to the same fields across multiple destination tables in batches.

You can assign constants and variables by switching the type in Value Type. The following methods are supported:

  • Table field

    • Manual Input: Enter a constant value directly, such as abc.

    • Select variable: Select a supported system variable from the drop-down list. You can view the specific meaning of each variable in the image tooltip in the UI.

    • Function: You can use functions to perform simple transformations on the destination field. For more information, see Use function expressions to assign values to destination table fields.

  • Partition field: You can dynamically create partitions based on the enumerated values of a source field or the event time.

    • Manual input: Enter a constant value directly, such as abc.

    • Source field: Use the value of a source table field as the partition field value. The value type can be a field value or a time value.

      • Field value: The enumerated values from a source field. We recommend using a field with a limited number of distinct values to prevent creating too many partitions and dispersing the data excessively.

      • Time value: If the source field contains time values, you can process them based on different formats and specify the Target format to format the partition value.

        • Time string: A string that represents time, such as "2018-10-23 02:13:56" or "2021/05/18". You can serialize it into a time format by specifying the source and destination time formats. For the preceding examples, you can use formats like yyyy-MM-dd HH:mm:ss and yyyy/MM/dd to perform the serialization.

        • Time object: If the source value is already in a time-type format such as Date or Datetime, select this type directly.

        • Unix timestamp (seconds): A second-level timestamp. Numbers or strings that conform to the 10-digit timestamp format are also supported, such as 1610529203 and "1610529203".

        • Unix timestamp (milliseconds): A millisecond-level timestamp. Numbers or strings that conform to the 13-digit timestamp format are also supported, such as 1610529203002 and "1610529203002".

    • Select variable: You can use the source event change time, EVENT_TIME, as the source for the partition value. This is used in a similar way to the Source Field option.

    • Function: You can use functions to perform simple transformations on the source field and use the result as the partition value. For more information, see Use function expressions to assign values to destination table fields.

Note

Creating an excessive number of partitions can negatively impact synchronization efficiency. If more than 1,000 new partitions are created in a single day, partition creation will fail and the task will be terminated. Therefore, when defining the assignment method for partition fields, you must estimate the potential number of partitions. Use caution when creating partitions at the second or millisecond level.

Source Split Column

You can select a field from the source table in the Source Split Column drop-down list, or select Not Split. During execution, the synchronization task is divided into multiple subtasks based on this field to enable concurrent, batched data reading.

We recommend using the table's primary key as the source split column. String, float, and date types are not supported.

Currently, source split columns are supported only for MySQL sources.

Skip full synchronization for table

If you have configured full synchronization in the task steps, you can choose to skip it for a specific table. This is useful when the full data has already been synchronized to the destination through other means.

Full condition

This allows you to filter the source data during the full synchronization phase. You only need to write the WHERE clause content, not the WHERE keyword itself.

Configure DML Rule

DML message processing allows you to filter and control the captured data changes (Insert, Update, Delete) from the source before they are written to the destination. This rule is effective only during the incremental synchronization phase.

Other

Table Type: MaxCompute supports Standard Table, PK Delta Table, and Append Delta Table. If the destination table has not yet been created, you can select the table type when you edit the table structure. The type of an existing table cannot be changed.

  • The real-time full + incremental mode for an entire database only supports destination tables of the PK Delta Table type.

  • Replay mode in full incremental mode supports PK Delta Table. Incremental stream mode supports Standard Tables and Append Delta Table.

For a detailed introduction to Delta Tables, see Delta Table.

Step 4: Advanced settings

Advanced parameters

If you need to fine-tune the task configuration for custom synchronization requirements, you can go to the Advanced Parameters tab to modify advanced parameters.

  1. Click Advanced Settings in the upper-right corner of the UI to go to the advanced parameter configuration page.

  2. Modify the parameter values according to the tooltips. The meaning of each parameter is explained after its name.

  3. You can also use AI-powered configuration. Enter modification instructions in natural language, such as adjusting the task's concurrency. The AI model will generate recommended parameter values, which you can choose to accept based on your actual needs.

    image

Important

Modify these parameters only if you fully understand their meaning to avoid unexpected issues such as task delays, excessive resource consumption that blocks other tasks, or data loss.

DDL configuration

Some real-time synchronization tasks can detect metadata changes in the source table structure and notify the destination. The destination can then either synchronize the update or take other actions, such as sending an alert, ignoring the change, or terminating the task.

You can click Configure DDL Capability in the upper-right corner of the UI to set a processing policy for each type of change. The supported policies vary depending on the channel.

  • Normal processing: The destination processes the DDL change from the source.

  • Ignore: The change message is ignored, and no modification is made at the destination.

  • Error: The whole database real-time synchronization task is terminated, and its status is set to Error.

  • Alert: An alert is sent to the user when this type of change occurs at the source. You must configure a DDL notification rule in the Configure Alert Rule settings.

Note

When DDL synchronization adds a new column to the destination, the system does not backfill data for that column in existing rows.

Step 5: Deploy and run the task

  1. After configuring the task, click Save.

  2. Whole database synchronization tasks cannot be debugged directly. They must be deployed to the Operation Center to run. Therefore, whether you are creating a new task or editing an existing one, you must Deploy the task for the changes to take effect.

  3. When you deploy the task, if you select Start immediately after deployment, the task is also started. Otherwise, after the deployment is complete, go to the Data Integration > Synchronization Task page and manually start the task in the Actions column of the target task.

  4. Click the Name/ID of the corresponding task in the Tasks to view the detailed execution process.

Step 6: Configure alarms

1. Add an alarm rule

In the Data Integration > Synchronization Task list, find the corresponding real-time full-database synchronization task, and click More > Alerts in the Actions column to configure an alert policy for the task.

image

(1) Click Create Rule to configure an alarm rule.

By setting the Alert Reason, you can monitor task metrics such as Business delay, Failover, Task status, DDL Notification, and Task Resource Utilization. You can set CRITICAL or WARNING alarm levels based on specified thresholds.

  • By using Configure Advanced Parameters, you can control the interval for sending alarm messages to prevent alert fatigue and message backlogs.

  • If you select Business delay, Task status, or Task Resource Utilization as the alarm trigger, you can also enable recovery notifications to inform recipients when the task returns to a normal state.

(2) Manage alarm rules.

For created alarm rules, you can use the alarm switch to enable or disable them. You can also send alarms to different personnel based on the alarm level.

2. View alarms

Click More > Configure Alert Rule for a task to go to the Alarm Events page, where you can view information about triggered alarms.

Manage the task

Edit the task

  1. On the Data Integration > Synchronization Task page, find the synchronization task that you created, click More in the Operation column, and then click Edit. You can modify the task information. The procedure is the same as that for task configuration.

  2. For tasks that are not running, you can directly modify the configuration, save it, and then deploy the task to the Operation Center for the changes to take effect.

  3. For tasks that are Running, if you edit and deploy the task without selecting Start immediately after deployment, the original operation button changes to Apply Updates. You must click this button for the changes to take effect in the Operation Center.

  4. After you click Apply Updates, the system performs a "Stop, Deploy, Restart" sequence to apply the changes.

    • If the change is to add a new table or switch to an existing table:

      You cannot select a checkpoint when applying the updates. After you click confirm, the system performs structural migration and full synchronization for the new table. Once complete, incremental synchronization begins for the new table alongside the original ones.

    • If you modify other information:

      You can select a checkpoint when applying the updates. After you click confirm, the task resumes from the specified checkpoint. If you do not specify a checkpoint, it resumes from the point where it last stopped.

    Unmodified tables are not affected. After the update and restart, they will resume from their last stopping point.

View the task

After you create a synchronization task, you can view the list of created tasks and their basic information on the synchronization task page.

image

  • In the Operation column, you can Start or Stop the synchronization task. In the More menu, you can perform other operations such as Edit and View.

  • For started tasks, you can see the basic running status in the Execution Overview. You can also click the corresponding overview area to view execution details.

    image

Resume from breakpoint

Use cases

Manually resetting the checkpoint is useful in the following scenarios:

  • Task recovery and data continuation: When a task is interrupted, you may need to manually specify an interruption time as the new starting checkpoint to ensure that data recovery is accurate.

  • Data issue troubleshooting and backtracking: If data is lost or abnormal, roll back the checkpoint to a time before the issue occurred to replay and repair the data.

  • Major task configuration changes: After making significant adjustments to the task configuration, such as the destination table structure or field mappings, we recommend resetting the checkpoint to start synchronization from a specific point in time to ensure data accuracy with the new configuration.

Procedure

Click Start. In the dialog box that appears, you can choose whether to Whether to reset the site:

image

  • To resume from the last stop time, leave the reset option unchecked and run the task directly. The task continues from the last recorded checkpoint.

  • To start from a specific time, select the reset option and choose a time. The task starts from the specified time checkpoint. Ensure the selected time is within the source's binlog retention period.

Important

If you encounter a "checkpoint error" or "checkpoint does not exist" message when running the synchronization task, try the following solutions:

  • Reset checkpoint: When starting the real-time synchronization task, reset the checkpoint and select the earliest available checkpoint in the source database.

  • Adjust log retention time: If the database checkpoint has expired, consider increasing the log retention period in the database, for example, to 7 days.

  • Data synchronization: If data has been lost, perform a full synchronization again or create an offline synchronization task to manually sync the missing data.

FAQ

For frequently asked questions about whole database real-time synchronization, see FAQ about real-time synchronization and FAQ about full and incremental synchronization tasks.

Appendix: Incremental stream format

Flattened source table fields

Field Name

Description

sequence_id

The record ID of the incremental event. The value is unique and incremental.

operation_type

The operation type (I/D/U).

execute_time

The timestamp corresponding to the data.

before_image

Indicates whether it is the image before the change (Y/N).

after_image

Indicates whether it is the image after the change (Y/N).

src_datasource

The source data source.

src_database

The source database.

src_table

The source table.

Field 1

Actual data field 1.

Field 2

Actual data field 2.

Field 3

Actual data field 3.

Source fields merged into JSON

Field Name

Description

sequence_id

The record ID of the incremental event. The value is unique and incremental.

operation_type

The operation type.
DDL: ALTER, TRUNCATE, RENAME

execute_time

The timestamp corresponding to the data.

before_image

Indicates whether it is the image before the change (Y/N).

after_image

Indicates whether it is the image after the change (Y/N).

src_datasource

The source data source.

src_database

The source database.

src_table

The source table.

ddl_sql

For DDL operations, this field contains the DDL statement.

data_columns

The actual data fields, merged into a JSON object.