All Products
Search
Document Center

DataWorks:Configure a real-time full database synchronization task

Last Updated:Nov 26, 2025

The real-time full database synchronization feature combines a one-time full migration with continuous incremental capture to sync an entire source database, such as MySQL or Oracle, to a target system with low latency. This task first performs a full synchronization of historical data from the source database and automatically initializes the table schemas and data in the target. Then, it automatically switches to a real-time incremental mode, using technologies such as Change Data Capture (CDC) to continuously capture and sync subsequent data changes. This feature is suitable for scenarios such as building real-time data warehouses and data lakes. This topic uses the real-time synchronization of a full MySQL database to MaxCompute as an example to describe how to configure the task.

Preparations

Access the feature

Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Integration > Data Integration. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.

Limits

DataWorks supports two types of full database synchronization: real-time and full and incremental (Near Real-Time). Both types can perform a full synchronization of historical data from the source database and then automatically switch to an incremental mode. However, they differ in terms of timeliness and requirements for the target table:

  • Timeliness: Real-time full database synchronization has a latency of seconds to minutes. Full and incremental (Near Real-Time) synchronization has a T+1 latency.

  • Target table (MaxCompute): Real-time full database synchronization only supports tables of the Delta tables type. Full and incremental (Near Real-Time) synchronization supports all table types.

Configure the task

1. Create a sync task

You can create a sync task in one of the following two ways:

  • Method 1: On the sync task page, select a Source and a Destination, and then click Create Synchronization Task. In this example, select MySQL for the source and MaxCompute for the destination.

  • Method 2: On the sync task page, if the task list is empty, click Create.

image

2. Basic Settings

  1. Configure basic information such as the task name, description, and owner.

  2. Select a sync type. Data Integration displays the supported Task Type options based on the source and destination database types. In this topic, select Real-time migration of entire database.

  3. Synchronization steps:

    • Structural migration: Automatically creates database objects (such as tables, fields, and data types) in the target that match the source. This step does not include data.

    • Full initialization (Optional): Copies all historical data from specified source objects (such as tables) to the target in a single operation. This is typically used for initial data migration or data initialization.

    • Incremental synchronization (Optional): After the full synchronization is complete, continuously captures and syncs subsequent data changes (inserts, updates, and deletes) from the source to the target.

3. Configure network and resources

  1. In the Network And Resource Configuration section, select the Resource Group for the sync task. You can assign a number of CUs to the Task Resource Usage.

  2. Set Source to the added MySQL data source and Destination to the added MaxCompute data source. Then, click Test Connectivity.image

  3. After ensuring that both the source and target data sources are successfully connected, click Next.

4. Select the databases and tables to sync

In the Source Table area, select the tables to sync from the source data source. Click the image icon to move the tables to the Selected Tables list.

image

If there are many databases or tables, you can use Database Filtering or Database Filtering to select the tables to sync by configuring a regular expression.

5. Map to target tables

In this step, you can define the mapping rules between source and target tables. You can also specify rules for primary keys, dynamic partitions, and DDL/DML configurations to determine how data is written.

Operation

Description

Refresh

The system automatically lists the source tables you selected. However, the specific properties of the target tables take effect only after you refresh and confirm them.

  • Select the tables to sync in batches and click Batch Refresh Mapping.

  • Target Table Name: The target table name is automatically generated based on the Customize Mapping Rules rule. The default is ${Source_Database_Name}_${Table_Name}. If a table with this name does not exist in the target, the system automatically creates it for you.

Customize Target Table Name Mapping (Optional)

The system has a default table name generation rule: ${Source_Database_Name}_${Table_Name}. You can also click the Edit button in the Customize Mapping Rules column to add a custom rule for target table names.

  • Rule Name: Define a name for the rule. We recommend giving the rule a name with clear business meaning.

  • Target Table Name: You can create the target table name by clicking the image button and combining a Manual Input and a Built-in Variable. Supported variables include the source data source name, source database name, and source table name.

  • Edit Built-in Variable: You can perform string transformations on the original built-in variables.

This allows for the following scenarios:

  1. Add a prefix or suffix: Add a prefix or suffix to the source table name by setting a constant.

    Rule configuration

    Result

    image

    image

  2. Uniform string replacement: Replace the string "dev_" in the source table name with "prd_".

    Rule configuration

    Result

    image

    image

  3. Write multiple tables to a single table.

    Rule configuration

    Result

    image

    image

Edit Mapping of Field Data Types (Optional)

The system has a default mapping between source and target field types. You can click Edit Mapping of Field Data Types in the upper-right corner of the table to customize the mapping relationship. After configuration, click Apply And Refresh Mapping.

When editing field type mappings, ensure the conversion rules are correct. Otherwise, type conversion failures can produce dirty data and affect task execution.

Edit Target Table Schema (Optional)

The system automatically creates a target table if it does not exist, or reuses an existing table with the same name, based on the custom table name mapping rule.

DataWorks automatically generates the target table schema based on the source table schema. In most scenarios, no manual intervention is needed. You can also modify the table schema in the following ways:

  • To add a field to a single table, click the image.png button in the Destination Table Name column.

  • Add fields in batches: Select all tables to be synced, and in the menu at the bottom of the table, choose Batch Modify > Destination Table Schema - Batch Modify and Add Field.

  • Renaming columns is not supported.

For existing tables, you can only add fields. For new tables, you can add fields and partition fields, and set the table type or table properties. For more information, see the editable areas in the interface.

Assign Values To Target Table Fields

Native fields are automatically mapped based on matching field names in the source and target tables. You must manually assign values for the new fields and partition fields added in the previous step. Perform the following operations:

  • Assign values to a single table: Click the Configure button in the Value assignment column to assign values to the target table fields.

  • Assign values in batches: In the menu at the bottom of the list, choose Batch Modify > Value assignment to assign values to the same fields in multiple target tables in batches.

You can assign constants and variables. Switch the type in Manually Assign Value. The following methods are supported:

  • Field in Destination Table

    • Manual assignment: Directly enter a constant value, such as abc.

    • Select variable: Select a system-supported variable from the drop-down list. You can view the meaning of each variable in the image tooltip on the interface.

    • Function: Use functions for simple transformations on the target field. For more information about how to use functions, see Use function expressions to assign values to target table fields.

  • Partition Fields: You can use the enumerated values of a source field or the event time as the partition value to dynamically create partitions.

    • Manually Assign Value: Directly enter a constant value, such as abc.

    • Source Field: Use the value of a source table field as the partition field value. The value type can be a field value or a time value.

      • Field Value: The enumerated values of the source field. We recommend using fields with a limited number of enumerated values to prevent creating too many partitions and dispersing data too widely.

      • Time Value: If the value in the source field is a time, you can process it according to different formats and specify a Destination Forma to format the partition value.

        • Time String: A string that represents a time, such as "2018-10-23 02:13:56" or "2021/05/18". Serialize it into a time by specifying the source and target time formats. For the examples above, you can use the formats yyyy-MM-dd HH:mm:ss and yyyy/MM/dd for serialization and recognition.

        • Time Object: If the source value is already in a time format such as Date or Datetime, select this type directly.

        • Unix Timestamp (seconds): A timestamp in seconds. It also supports numbers or strings that match the 10-digit timestamp format, such as 1610529203 or "1610529203".

        • Unix Timestamp (milliseconds): A timestamp in milliseconds. It also supports numbers or strings that match the 13-digit timestamp format, such as 1610529203002 or "1610529203002".

    • Select Variable: You can use the source event change time, EVENT_TIME, as the source for the partition value. The usage is similar to that of a source field.

    • Function: Use functions to perform simple transformations on the source field and use the result as the partition value. For more information about how to use functions, see Use function expressions to assign values to target table fields.

Note

Note that creating too many partitions affects synchronization efficiency. If more than 1,000 new partitions are created in a single day, partition creation fails and the task is terminated. Therefore, when defining the assignment method for partition fields, you need to estimate the number of partitions that might be generated. Use caution when creating partitions at the second or millisecond level.

Source Split Column

You can select a field from the source table in the source sharding column drop-down list or select Not Split. When the sync task runs, it will be split into multiple tasks based on this field to read data concurrently and in batches.

We recommend using the table's primary key as the source sharding column. String, float, date, and other types are not supported.

Currently, the source sharding column is supported only when the source is MySQL.

Do Execute Once Offline

If you have configured full synchronization in step 3, you can choose to cancel the full data synchronization for a specific table. This is useful in scenarios where the full data has already been synced to the target using other methods.

Full Condition

Filter the source data during the full synchronization phase. Only write the where clause here. Do not include the where keyword.

Configure DML Rule

DML message processing is used for fine-grained filtering and control of change data (Insert, Update, Delete) captured from the source before it is written to the target. This rule only takes effect during the incremental phase.

Other

  • Table Type: MaxCompute supports standard tables and Delta Tables. If the target table status is 'To be created', you can select the table type when editing the target table schema. The type of an existing table cannot be changed.

    Real-time full database synchronization only supports Delta Table as the target table type. For standard table types, see Full and incremental (Near Real-Time) synchronization tasks.
  • If the table type is Delta Table, you can define the Table Bucket Num and Acid Data Retain Hours.

For a detailed introduction to Delta Tables, see Delta tables.

6. Configure DDL handling

Some real-time synchronization links can detect metadata changes in the source table schema and notify the target. The target can then update accordingly or take other actions such as sending an alert, ignoring the change, or terminating the task.

Click Configure DDL Capability in the upper-right corner of the interface to set a corresponding processing policy for each type of change. The supported policies vary by channel.

  • Normal processing: The target processes the DDL change information from the source.

  • Ignore: Ignores the change message. The target is not modified.

  • Alert: Sends an alert to the user when this type of change occurs at the source. This must be used with Configure Alert Rule.

  • Error: Terminates the real-time full database synchronization task and sets its status to Error.

Note

When a new column is added at the source and also created at the target through DDL synchronization, the system does not backfill data for the existing rows in the target table.

7. Other configurations

Alerting configuration

1. Add an alert

image

(1) Click Add Alert Rule to configure an alert rule.

You can set the Alert Reason to monitor metrics such as Business delay, Failover, Task Status, DDL Notification, and Task Resource Utilization. You can set CRITICAL or WARNING alert levels based on specified thresholds.

By setting Alerting Frequency Control, you can control the interval at which alert messages are sent. This prevents sending too many messages at once, which can cause waste and message accumulation.

(2) Manage alert rules.

For created alert rules, you can use the alert switch to enable or disable them. You can also send alerts to different personnel based on the alert level.

2. View alerts

Expand the task list, choose More > Configure Alert Rule, and go to the alert events page to view alerts that have occurred.

Resource group configuration

The resource group used by the task and its configuration can be managed in the Configure Resource Group panel in the upper-right corner of the interface.

1. View and switch resource groups

  • Click Configure Resource Group to view the resource group currently bound to the task.

  • To change the resource group, you can switch to another available one in this panel.

2. Adjust resources and troubleshoot 'insufficient resources' errors

  • When the task log shows a message such as Please confirm whether there are enough resources..., it means the available compute units (CUs) in the current resource group are insufficient for the task's startup or running requirements. In the Configure Resource Group panel, you can increase the number of CUs allocated to the task to provide more computing resources.

For recommended resource settings, see Recommended CUs for Data Integration. You may need to adjust the settings based on your requirements.

Advanced parameter settings

To perform fine-grained configuration for the task to meet custom synchronization requirements, click Configure in the Advanced Settings to modify advanced parameters.

  1. Click Advanced Settings in the upper-right corner of the interface to go to the advanced parameter settings page.

  2. Modify the parameter values according to the descriptions. The meaning of each parameter is explained next to its name.

Important

Modify these parameters only if you fully understand their meaning. This helps avoid unexpected issues such as task latency, excessive resource consumption that blocks other tasks, or data loss.

8. Run the sync task

  1. After you finish the configuration, click Complete at the bottom of the page.

  2. On the Data Integration > Synchronization Task page, find the created sync task and click Start in the Actions column.

  3. In the Tasks, click the Name/ID of the task to view the execution details.

Edit the task

  1. On the Data Integration > Synchronization Task page, find the created sync task. In the Actions column, choose More > Edit to modify the task information. The steps are the same as for task configuration.

  2. For a task that is Not Submitted, you can directly modify the configuration and then click Complete to save it.

  3. For a task that is Submitted, after you modify the configuration, the original action button changes to Apply Updates. You must click this button for the changes to take effect in the online environment.

  4. After you click Apply Updates, the system performs three steps on the changes: Stop, Publish, and Restart.

    • If the change involves adding a new table or switching an existing table:

      You cannot select an offset when you apply the update. After you confirm, the system performs schema migration and full initialization for the new table. After the full initialization is complete, the new table starts incremental operations along with the other original tables.

    • If you modify other information:

      You can select an offset when you apply the update. After you confirm, the task continues to run from the specified offset. If you do not specify an offset, it runs from the offset where it last stopped.

    Unmodified tables are not affected. After the update and restart, they will continue to run from the offset where they last stopped.

View the task

After you create a sync task, you can view the list of created sync tasks and their basic information on the sync task page.

image

  • You can Start or Stop the sync task in the Actions column. In the More menu, you can perform operations such as editing and viewing the sync task.

  • For a started task, you can see the basic running status in the Execution Status. You can also click the corresponding overview area to view execution details.

    image

Resume from a breakpoint

Scenarios

Manually resetting the offset when you start or restart a task is useful in the following scenarios:

  • Task recovery and data continuation: When a task is interrupted for any reason, you may need to manually specify an interruption time point as the new start offset to ensure the task resumes from the correct breakpoint.

  • Data issue troubleshooting and backtracking: If you find that the synced data is lost or abnormal, you can roll back the offset to a time point before the problem occurred to replay and repair the problematic data.

  • Major changes to task configuration: After you make major adjustments to the task configuration (such as target table schema or field mapping), we recommend resetting the offset to start syncing from a clear time point. This ensures data accuracy with the new configuration.

Instructions

Click Start. In the pop-up window, select an option for Whether to reset the site:

image

  • Do not reset the offset and run directly: The task continues to run from the offset where it last stopped (the last checkpoint).

  • Reset the offset and select a time: The task starts running from the specified time point. Make sure the selected time is not earlier than the earliest time point supported by the source's binary logging.

Important

If you encounter an offset error or a 'does not exist' message when running the sync task, try the following solutions:

  • Reset the offset: When starting the real-time sync task, reset the offset and select the earliest available offset in the source database.

  • Adjust the log retention period: If the database offset has expired, consider adjusting the log retention period in the database, for example, to 7 days.

  • Data synchronization: If data has been lost, consider performing a full synchronization again, or configure an offline sync task to manually sync the lost data.

FAQ

For FAQs about real-time full database synchronization, see Real-time synchronization FAQ and Full and incremental synchronization task FAQ.

More examples