All Products
Search
Document Center

DataWorks:Real-time database synchronization features

Last Updated:Feb 13, 2026

DataWorks Data Integration provides a powerful solution for real-time database synchronization. This solution lets you replicate entire databases or specific tables from a source to a destination in a unified, low-latency process that combines full and incremental synchronization. Powered by a real-time computing engine, this feature automates the initial full data load and seamlessly transitions to continuous Change Data Capture (CDC). This feature provides a one-stop solution for scenarios such as real-time database migration to the cloud and building the Operational Data Store (ODS) layer of a real-time data warehouse.

Use cases

  • Build a real-time data warehouse ODS layer

    Synchronize data in real time from online transaction processing (OLTP) databases, such as MySQL or Oracle, to a real-time data warehouse like Hologres or StarRocks. This provides data support for business intelligence (BI) dashboards, ad hoc queries, and other applications.

  • Enable real-time database replication and disaster recovery

    Create a real-time replication task between two database instances. You can use this for read/write splitting, creating read-only instances, or implementing real-time disaster recovery for homogeneous or heterogeneous databases.

  • Perform real-time data migration to the cloud

    Enables smooth migration of databases from an on-premises data center to cloud database services.

  • Build a real-time data lake or data middle platform

    Collect real-time change data from multiple business databases into a data lake, such as Object Storage Service (OSS) or Data Lake Formation (DLF), or a data warehouse, such as MaxCompute or Hologres, to build a unified, real-time Data Middle Platform for your enterprise.

Core features

The core features of real-time database synchronization include:

image

Core feature

Specific feature

Description

Database synchronization between heterogeneous data sources

-

You can synchronize an entire database from an on-premises data center or a third-party cloud to a data warehouse or data lake, such as MaxCompute, Hologres, or Kafka. For more information, see Supported data sources and synchronization solutions.

Data synchronization in complex network environments

-

Real-time synchronization supports Alibaba Cloud databases, databases in an on-premises data center, self-managed databases on ECS, and third-party cloud databases. Before you begin, ensure that the resource group can connect to the source and destination. For more information, see Configure network connections.

Synchronization scenarios

Full synchronization

Synchronizes all data from the source to the destination table in a single operation.

Incremental synchronization

Captures streaming data from sources such as a message queue or CDC logs and writes the data to the destination table or a specified partition in real time.

Full and incremental synchronization

  • Automatic full load: When a task starts for the first time, it automatically reads all existing data from all tables in the source database and writes it to the destination.

  • Seamless transition to incremental mode: After the full load is complete, the task automatically and seamlessly switches to CDC mode. It continuously captures INSERT, UPDATE, and DELETE operations from the source and synchronizes them to the destination with millisecond-level latency.

Task configuration

Batch table synchronization

You can synchronize all tables in a database or select specific tables by using checkboxes or configuring filter rules.

Automatic table creation

A single task configuration handles hundreds of tables in the source database. The system automatically creates the table schemas in the destination, eliminating the need for manual intervention.

Flexible mapping

You can define custom naming rules for destination databases and tables. You can also define custom mappings for field data types between the source and destination to accommodate the destination's data model.

DDL change awareness (Supported by some tasks)

When a source table schema changes (for example, a table or column is created or deleted), you can configure the synchronization task to respond with one of the following policies:

  • Normal: Automatically apply the corresponding schema changes to the destination.

  • Alert: Do not apply changes and send an alert, pending manual intervention.

  • Error: Immediately stop the task and set its status to error.

DML rule configuration

You can use DML message processing to filter and control change data captured from the source, such as Insert, Update, and Delete operations, before it is written to the destination. These rules allow you to define how different data manipulation operations are handled.

Dynamic partitioning

If the destination table is a partitioned table, you can enable dynamic partitioning based on a source field or the event time of a source change.

Important

Creating too many partitions can affect synchronization performance. If more than 1,000 new partitions are added in a single day, partition creation fails and the task is terminated.

Task O&M

Online intervention

Tasks support resumption from checkpoint, allowing them to resume from a specific time point after an interruption to ensure no data is lost. You can also rerun tasks to backfill data, fix anomalies, or validate logic changes, ensuring data consistency and business continuity.

Monitoring and alerting

You can define monitoring rules for business latency, task status, failover, and DDL notifications and configure alerts to be triggered based on these rules.

Resource optimization

DataWorks Data Integration is based on a Serverless resource group and provides elastic scaling capabilities at the task level.

You can also configure time-based elastic policies to preset different resource specifications for tasks during different periods, such as peak and off-peak business hours.

Get started

To create a real-time database synchronization task, see Configure a real-time database synchronization task.

Supported data sources

Source

Destination

MaxCompute

AnalyticDB for MySQL (V3.0)

ApsaraDB for OceanBase

Data Lake Formation (DLF)

DataHub

Doris

Elasticsearch

Hologres

Kafka

LogHub

Object Storage Service (OSS)

OSS-HDFS

SelectDB

StarRocks

Lindorm