Features of real-time database synchronization - DataWorks

DataWorks Data Integration provides a powerful solution for real-time database synchronization. This solution lets you replicate entire databases or specific tables from a source to a destination in a unified, low-latency process that combines full and incremental synchronization. Powered by a real-time computing engine, this feature automates the initial full data load and seamlessly transitions to continuous Change Data Capture (CDC). This feature provides a one-stop solution for scenarios such as real-time database migration to the cloud and building the Operational Data Store (ODS) layer of a real-time data warehouse.

Use cases

Build a real-time data warehouse ODS layer
Synchronize data in real time from online transaction processing (OLTP) databases, such as MySQL or Oracle, to a real-time data warehouse like Hologres or StarRocks. This provides data support for business intelligence (BI) dashboards, ad hoc queries, and other applications.
Enable real-time database replication and disaster recovery
Create a real-time replication task between two database instances. You can use this for read/write splitting, creating read-only instances, or implementing real-time disaster recovery for homogeneous or heterogeneous databases.
Perform real-time data migration to the cloud
Enables smooth migration of databases from an on-premises data center to cloud database services.
Build a real-time data lake or data middle platform
Collect real-time change data from multiple business databases into a data lake, such as Object Storage Service (OSS) or Data Lake Formation (DLF), or a data warehouse, such as MaxCompute or Hologres, to build a unified, real-time Data Middle Platform for your enterprise.

Core features

The core features of real-time database synchronization include:

Core feature	Specific feature	Description
Database synchronization between heterogeneous data sources	-	You can synchronize an entire database from an on-premises data center or a third-party cloud to a data warehouse or data lake, such as MaxCompute, Hologres, or Kafka. For more information, see Supported data sources and synchronization solutions.
Data synchronization in complex network environments	-	Real-time synchronization supports Alibaba Cloud databases, databases in an on-premises data center, self-managed databases on ECS, and third-party cloud databases. Before you begin, ensure that the resource group can connect to the source and destination. For more information, see Configure network connections.
Synchronization scenarios	Full synchronization	Synchronizes all data from the source to the destination table in a single operation.
	Incremental synchronization	Captures streaming data from sources such as a message queue or CDC logs and writes the data to the destination table or a specified partition in real time.
	Full and incremental synchronization	Automatic full load: When a task starts for the first time, it automatically reads all existing data from all tables in the source database and writes it to the destination. Seamless transition to incremental mode: After the full load is complete, the task automatically and seamlessly switches to CDC mode. It continuously captures INSERT, UPDATE, and DELETE operations from the source and synchronizes them to the destination with millisecond-level latency.
Task configuration	Batch table synchronization	You can synchronize all tables in a database or select specific tables by using checkboxes or configuring filter rules.
	Automatic table creation	A single task configuration handles hundreds of tables in the source database. The system automatically creates the table schemas in the destination, eliminating the need for manual intervention.
	Flexible mapping	You can define custom naming rules for destination databases and tables. You can also define custom mappings for field data types between the source and destination to accommodate the destination's data model.
	DDL change awareness (Supported by some tasks)	When a source table schema changes (for example, a table or column is created or deleted), you can configure the synchronization task to respond with one of the following policies: Normal: Automatically apply the corresponding schema changes to the destination. Alert: Do not apply changes and send an alert, pending manual intervention. Error: Immediately stop the task and set its status to error.
	DML rule configuration	You can use DML message processing to filter and control change data captured from the source, such as `Insert`, `Update`, and `Delete` operations, before it is written to the destination. These rules allow you to define how different data manipulation operations are handled.
	Dynamic partitioning	If the destination table is a partitioned table, you can enable dynamic partitioning based on a source field or the event time of a source change. Important Creating too many partitions can affect synchronization performance. If more than 1,000 new partitions are added in a single day, partition creation fails and the task is terminated.
Task O&M	Online intervention	Tasks support resumption from checkpoint, allowing them to resume from a specific time point after an interruption to ensure no data is lost. You can also rerun tasks to backfill data, fix anomalies, or validate logic changes, ensuring data consistency and business continuity.
	Monitoring and alerting	You can define monitoring rules for business latency, task status, failover, and DDL notifications and configure alerts to be triggered based on these rules.
	Resource optimization	DataWorks Data Integration is based on a Serverless resource group and provides elastic scaling capabilities at the task level. You can also configure time-based elastic policies to preset different resource specifications for tasks during different periods, such as peak and off-peak business hours.

Get started

To create a real-time database synchronization task, see Configure a real-time database synchronization task.

Supported data sources

Source	Destination
ApsaraDB for OceanBase MySQL Oracle PolarDB PolarDB-X 2.0 PostgreSQL	MaxCompute
MySQL Oracle PolarDB PostgreSQL	AnalyticDB for MySQL (V3.0)
MySQL	ApsaraDB for OceanBase
MySQL MongoDB	Data Lake Formation (DLF)
MySQL Oracle PolarDB	DataHub
MySQL PostgreSQL	Doris
MySQL PolarDB	Elasticsearch
ApsaraDB for OceanBase MongoDB MySQL Oracle PolarDB PolarDB-X 2.0 PostgreSQL	Hologres
MySQL Oracle PolarDB	Kafka
MySQL MySQL (sharded)	LogHub
MySQL PolarDB	Object Storage Service (OSS)
MySQL	OSS-HDFS
MySQL PostgreSQL	SelectDB
MySQL Oracle PolarDB	StarRocks
PostgreSQL	Lindorm