DataWorks real-time database synchronization - Data Integration - DataWorks

DataWorks Data Integration replicates entire databases or selected tables from a source to a destination using integrated full and incremental (CDC) synchronization. It automatically performs a full data load and then continuously captures incremental changes, supporting use cases such as real-time cloud migration and ODS layer construction.

Use cases

Build a real-time ODS Layer for your data warehouse

Synchronize data from operational databases such as MySQL and Oracle to a real-time data warehouse like Hologres or StarRocks, providing fresh data for dashboards and ad hoc queries.
Real-time database replication for disaster recovery

Establish a real-time replication link between two database instances for read/write splitting, read-only replicas, or disaster recovery across homogeneous or heterogeneous databases.
Real-time data migration to the cloud

Seamlessly migrate databases from an on-premises data center to a cloud database service.
Build a real-time Data Lake or Data Middle Platform

Collect real-time change data from multiple operational databases into a central data lake (OSS or DLF) or data warehouse (MaxCompute or Hologres) to build a unified real-time Data Middle Platform.

Core capabilities

Real-time database synchronization provides the following capabilities:

Core capability	Feature	Description
Synchronize entire databases between heterogeneous data sources	-	Migrates data from on-premises data centers or other cloud platforms to a data warehouse or data lake such as MaxCompute, Hologres, or Kafka. Supported data sources and synchronization solutions.
Synchronize data in complex network environments	-	Supports data from Alibaba Cloud database services, on-premises data centers, self-managed databases on ECS, and other cloud providers. Ensure network connectivity between the resource group and your source and destination. Configure network connections.
Synchronization scenarios	Full synchronization	Performs a one-time synchronization of all data from the source to the destination table.
	Incremental synchronization	Captures streaming data in real time from sources like message queues or CDC logs, and writes it to a destination table or a specified Partition.
	Integrated full and incremental synchronization	Automatic full initialization: On first start, the task reads all existing source data and writes it to the destination. Seamless transition to incremental mode: After the full load, the task switches to CDC mode, continuously capturing and applying insert, update, and delete operations with millisecond-level latency.
Task configuration	Batch table synchronization	Synchronize all tables or select a subset using checkboxes or filter rules.
	Automatic table creation	A single configuration processes hundreds of source tables. The system automatically creates the corresponding schema in the destination.
	Flexible mapping	Define custom naming conventions for destination databases and tables, and customize data type mapping between source and destination.
	DDL Change Awareness (Supported on select paths)	When the source schema changes (such as table or column creation or deletion), configure the task to respond in one of these ways: Normal: Automatically applies schema changes to the destination. Alert: Pauses synchronization, sends an alert, and waits for manual intervention. Error: Stops the task and marks it as failed.
	DML rules	Controls how change data (`Insert`, `Update`, and `Delete` operations) from the source is processed before writing to the destination.
	Dynamic partitioning	If the destination table is partitioned, enable dynamic partitioning based on a source field or event timestamp. Important Creating an excessive number of partitions can degrade synchronization performance. If more than 1,000 new partitions are created in a single day, partition creation fails and the task terminates.
Task O&M	Online intervention	Resume tasks from a specific checkpoint after an interruption to prevent data loss. Rerun tasks for data backfilling, exception handling, or logic validation.
	Monitoring and alerting	Configure monitoring rules for business latency, task status, failover events, and DDL notifications to trigger alerts.
	Resource optimization	DataWorks Data Integration supports task-level elastic scaling with the Serverless Resource Group. Configure time-based elasticity policies to automatically adjust resource specifications for peak and off-peak hours.

Get started

Configure a real-time database synchronization task.

Supported data sources

Source	Destination
ApsaraDB for OceanBase MongoDB MySQL Oracle PolarDB PolarDB-X 2.0 PostgreSQL	MaxCompute
ApsaraDB for OceanBase MySQL Oracle PolarDB PostgreSQL	AnalyticDB for MySQL (V3.0)
MySQL	ApsaraDB for OceanBase
ApsaraDB for OceanBase MongoDB MySQL PolarDB PostgreSQL	Data Lake Formation (DLF)
MySQL Oracle PolarDB	DataHub
MySQL PolarDB PostgreSQL	Doris
MySQL PolarDB	Elasticsearch
ApsaraDB for OceanBase MongoDB MySQL Oracle PolarDB PolarDB-X 2.0 PostgreSQL	Hologres
MySQL PolarDB PostgreSQL	Kafka
MySQL MySQL (sharded) PolarDB (sharded)	LogHub
MySQL PolarDB	Object Storage Service (OSS)
MySQL	OSS-HDFS
MySQL PolarDB PostgreSQL	SelectDB
MySQL Oracle PolarDB	StarRocks
PostgreSQL	Lindorm