DataWorks Data Integration provides a powerful solution for real-time database synchronization. This solution helps you replicate all or a subset of tables from a source database to a destination data store. It uses an integrated full and incremental approach for low-latency, automated replication. The feature runs on a real-time computing engine, automatically performs an initial full data synchronization, and then seamlessly switches to continuously capture incremental data changes (Change Data Capture (CDC)). This provides a one-stop solution for scenarios such as real-time database migration to the cloud and building a real-time Operational Data Store (ODS) layer.
Scenarios
Build a real-time ODS layer for your data warehouse
You can synchronize data in real time from online business databases, such as MySQL and Oracle, to real-time data warehouses such as Hologres and StarRocks. This provides data for business scenarios such as large-screen displays and ad hoc queries.
Replicate databases in real time for disaster recovery
You can create a real-time replication task between two database instances. You can use this for read/write splitting, building read-only instances, or implementing real-time disaster recovery for homogeneous or heterogeneous databases.
Migrate data to the cloud in real time
You can smoothly migrate databases from on-premises data centers to cloud database services.
Build a real-time data lake or data mid-end
You can centrally collect real-time change data from multiple business databases into a data lake, such as Object Storage Service (OSS) or Data Lake Formation (DLF), or a data warehouse, such as MaxCompute or Hologres. This helps you build a unified, real-time data mid-end for your enterprise.
Core features
The core features of real-time database synchronization are as follows:
Core feature | Specific feature | Description |
Database synchronization between disparate data sources | - | Database synchronization supports migrating data from on-premises data centers or other cloud platforms to data warehouses or data lakes such as MaxCompute, Hologres, and Kafka. For more information, see Supported data sources and synchronization solutions. |
Data synchronization in complex network environments | - | Real-time synchronization supports data synchronization from ApsaraDB databases, on-premises data centers, self-managed databases on ECS, or non-Alibaba Cloud databases. Before you configure a task, make sure the resource group can connect to the source and destination. For more information about configuration, see Configure network connections. |
Synchronization scenarios | Full synchronization | Supports one-time synchronization of all data from the source to the destination table. |
Incremental synchronization | Supports real-time capture of streaming data, such as from message queues or CDC logs, and writing it to a destination table or a specified partition. | |
Full and incremental synchronization |
| |
Task configuration | Batch table synchronization | Supports synchronizing all tables in a database. You can also select specific tables to synchronize by selecting them or configuring filter rules. |
Automatic table creation | A single configuration can process hundreds of tables in the source database. The system automatically creates the table schemas at the destination without manual intervention. | |
Flexible mapping | Supports custom naming conventions for destination databases and tables. Supports custom mapping of field types between the source and destination to adapt to the destination's data structure model. | |
DDL change awareness (supported for some sync links) | When the source table schema changes, such as creating or deleting tables or columns, you can configure the sync task to use one of the following response policies:
| |
DML rule configuration | DML message processing is used for fine-grained filtering and control of change data ( | |
Dynamic partitioning | If the destination table is a partitioned table, dynamic partitioning is supported based on a source field or the time of the source event change. Important Note that too many partitions can affect synchronization efficiency. If more than 1,000 new partitions are created in a single day, partition creation fails and the task is stopped. | |
Task O&M | Online intervention | Supports resumable uploads. If a task is interrupted, it can resume from a specified time offset to ensure no data is lost. Supports reruns to backfill data, fix exceptions, or validate logic changes. This ensures data consistency and business continuity. |
Monitoring and alerting | Supports monitoring rules for business latency, task status, failover, and DDL notifications. It also supports alerting for triggered rules. | |
Resource optimization | DataWorks Data Integration uses Serverless resource groups to provide elastic scaling at the task level. You can also configure a time-based elastic policy to preset different resource specifications for tasks at different times, such as during peak and off-peak hours. |
Get started
To create a real-time database synchronization task, see Configure a real-time database synchronization task.
Supported data sources
Source data source | Destination data source |
MaxCompute | |
AnalyticDB for MySQL (V3.0) | |
ApsaraDB for OceanBase | |
Data Lake Formation (DLF) | |
DataHub | |
Doris | |
Elasticsearch | |
Hologres | |
Kafka | |
LogHub | |
OSS | |
OSS-HDFS | |
SelectDB | |
StarRocks | |
Lindorm |