All Products
Search
Document Center

DataWorks:Features of real-time database synchronization tasks

Last Updated:Dec 09, 2025

DataWorks Data Integration provides a powerful solution for real-time database synchronization. This solution helps you replicate all or a subset of tables from a source database to a destination data store. It uses an integrated full and incremental approach for low-latency, automated replication. The feature runs on a real-time computing engine, automatically performs an initial full data synchronization, and then seamlessly switches to continuously capture incremental data changes (Change Data Capture (CDC)). This provides a one-stop solution for scenarios such as real-time database migration to the cloud and building a real-time Operational Data Store (ODS) layer.

Scenarios

  • Build a real-time ODS layer for your data warehouse

    You can synchronize data in real time from online business databases, such as MySQL and Oracle, to real-time data warehouses such as Hologres and StarRocks. This provides data for business scenarios such as large-screen displays and ad hoc queries.

  • Replicate databases in real time for disaster recovery

    You can create a real-time replication task between two database instances. You can use this for read/write splitting, building read-only instances, or implementing real-time disaster recovery for homogeneous or heterogeneous databases.

  • Migrate data to the cloud in real time

    You can smoothly migrate databases from on-premises data centers to cloud database services.

  • Build a real-time data lake or data mid-end

    You can centrally collect real-time change data from multiple business databases into a data lake, such as Object Storage Service (OSS) or Data Lake Formation (DLF), or a data warehouse, such as MaxCompute or Hologres. This helps you build a unified, real-time data mid-end for your enterprise.

Core features

The core features of real-time database synchronization are as follows:

image

Core feature

Specific feature

Description

Database synchronization between disparate data sources

-

Database synchronization supports migrating data from on-premises data centers or other cloud platforms to data warehouses or data lakes such as MaxCompute, Hologres, and Kafka. For more information, see Supported data sources and synchronization solutions.

Data synchronization in complex network environments

-

Real-time synchronization supports data synchronization from ApsaraDB databases, on-premises data centers, self-managed databases on ECS, or non-Alibaba Cloud databases. Before you configure a task, make sure the resource group can connect to the source and destination. For more information about configuration, see Configure network connections.

Synchronization scenarios

Full synchronization

Supports one-time synchronization of all data from the source to the destination table.

Incremental synchronization

Supports real-time capture of streaming data, such as from message queues or CDC logs, and writing it to a destination table or a specified partition.

Full and incremental synchronization

  • Automatic full initialization: When a task starts for the first time, it automatically reads all historical data from all tables in the source database and writes it to the destination.

  • Seamless switch to incremental mode: After the full synchronization phase, the task automatically and continuously switches to CDC mode. It captures insert, delete, and update operations from the source and synchronizes them to the destination with millisecond-level latency.

Task configuration

Batch table synchronization

Supports synchronizing all tables in a database. You can also select specific tables to synchronize by selecting them or configuring filter rules.

Automatic table creation

A single configuration can process hundreds of tables in the source database. The system automatically creates the table schemas at the destination without manual intervention.

Flexible mapping

Supports custom naming conventions for destination databases and tables. Supports custom mapping of field types between the source and destination to adapt to the destination's data structure model.

DDL change awareness (supported for some sync links)

When the source table schema changes, such as creating or deleting tables or columns, you can configure the sync task to use one of the following response policies:

  • Normal: The destination database automatically executes the corresponding table schema changes.

  • Alert: The change is not executed. An alert is sent to notify you to intervene manually.

  • Error: The task stops immediately and its status is set to failed.

DML rule configuration

DML message processing is used for fine-grained filtering and control of change data (InsertUpdateDelete) captured from the source before it is written to the destination. Using these rules, you can define the final processing policy for different data change operations.

Dynamic partitioning

If the destination table is a partitioned table, dynamic partitioning is supported based on a source field or the time of the source event change.

Important

Note that too many partitions can affect synchronization efficiency. If more than 1,000 new partitions are created in a single day, partition creation fails and the task is stopped.

Task O&M

Online intervention

Supports resumable uploads. If a task is interrupted, it can resume from a specified time offset to ensure no data is lost. Supports reruns to backfill data, fix exceptions, or validate logic changes. This ensures data consistency and business continuity.

Monitoring and alerting

Supports monitoring rules for business latency, task status, failover, and DDL notifications. It also supports alerting for triggered rules.

Resource optimization

DataWorks Data Integration uses Serverless resource groups to provide elastic scaling at the task level.

You can also configure a time-based elastic policy to preset different resource specifications for tasks at different times, such as during peak and off-peak hours.

Get started

To create a real-time database synchronization task, see Configure a real-time database synchronization task.

Supported data sources

Source data source

Destination data source

MaxCompute

AnalyticDB for MySQL (V3.0)

ApsaraDB for OceanBase

Data Lake Formation (DLF)

DataHub

Doris

Elasticsearch

Hologres

Kafka

LogHub

OSS

OSS-HDFS

SelectDB

StarRocks

Lindorm