All Products
Search
Document Center

DataWorks:Real-time database synchronization

Last Updated:Jun 02, 2026

DataWorks Data Integration replicates entire databases or selected tables from a source to a destination using integrated full and incremental (CDC) synchronization. It automatically performs a full data load and then continuously captures incremental changes, supporting use cases such as real-time cloud migration and ODS layer construction.

Use cases

  • Build a real-time ODS Layer for your data warehouse

    Synchronize data from operational databases such as MySQL and Oracle to a real-time data warehouse like Hologres or StarRocks, providing fresh data for dashboards and ad hoc queries.

  • Real-time database replication for disaster recovery

    Establish a real-time replication link between two database instances for read/write splitting, read-only replicas, or disaster recovery across homogeneous or heterogeneous databases.

  • Real-time data migration to the cloud

    Seamlessly migrate databases from an on-premises data center to a cloud database service.

  • Build a real-time Data Lake or Data Middle Platform

    Collect real-time change data from multiple operational databases into a central data lake (OSS or DLF) or data warehouse (MaxCompute or Hologres) to build a unified real-time Data Middle Platform.

Core capabilities

Real-time database synchronization provides the following capabilities:

image

Core capability

Feature

Description

Synchronize entire databases between heterogeneous data sources

-

Migrates data from on-premises data centers or other cloud platforms to a data warehouse or data lake such as MaxCompute, Hologres, or Kafka. Supported data sources and synchronization solutions.

Synchronize data in complex network environments

-

Supports data from Alibaba Cloud database services, on-premises data centers, self-managed databases on ECS, and other cloud providers. Ensure network connectivity between the resource group and your source and destination. Configure network connections.

Synchronization scenarios

Full synchronization

Performs a one-time synchronization of all data from the source to the destination table.

Incremental synchronization

Captures streaming data in real time from sources like message queues or CDC logs, and writes it to a destination table or a specified Partition.

Integrated full and incremental synchronization

  • Automatic full initialization: On first start, the task reads all existing source data and writes it to the destination.

  • Seamless transition to incremental mode: After the full load, the task switches to CDC mode, continuously capturing and applying insert, update, and delete operations with millisecond-level latency.

Task configuration

Batch table synchronization

Synchronize all tables or select a subset using checkboxes or filter rules.

Automatic table creation

A single configuration processes hundreds of source tables. The system automatically creates the corresponding schema in the destination.

Flexible mapping

Define custom naming conventions for destination databases and tables, and customize data type mapping between source and destination.

DDL Change Awareness (Supported on select paths)

When the source schema changes (such as table or column creation or deletion), configure the task to respond in one of these ways:

  • Normal: Automatically applies schema changes to the destination.

  • Alert: Pauses synchronization, sends an alert, and waits for manual intervention.

  • Error: Stops the task and marks it as failed.

DML rules

Controls how change data (InsertUpdate, and Delete operations) from the source is processed before writing to the destination.

Dynamic partitioning

If the destination table is partitioned, enable dynamic partitioning based on a source field or event timestamp.

Important

Creating an excessive number of partitions can degrade synchronization performance. If more than 1,000 new partitions are created in a single day, partition creation fails and the task terminates.

Task O&M

Online intervention

Resume tasks from a specific checkpoint after an interruption to prevent data loss. Rerun tasks for data backfilling, exception handling, or logic validation.

Monitoring and alerting

Configure monitoring rules for business latency, task status, failover events, and DDL notifications to trigger alerts.

Resource optimization

DataWorks Data Integration supports task-level elastic scaling with the Serverless Resource Group.

Configure time-based elasticity policies to automatically adjust resource specifications for peak and off-peak hours.

Get started

Configure a real-time database synchronization task.

Supported data sources

Source

Destination

MaxCompute

AnalyticDB for MySQL (V3.0)

ApsaraDB for OceanBase

Data Lake Formation (DLF)

DataHub

Doris

Elasticsearch

Hologres

Kafka

  • MySQL

  • MySQL (sharded)

  • PolarDB (sharded)

LogHub

Object Storage Service (OSS)

OSS-HDFS

SelectDB

StarRocks

Lindorm