DataWorks Data Integration provides a powerful solution for real-time database synchronization. This solution lets you replicate entire databases or specific tables from a source to a destination in a unified, low-latency process that combines full and incremental synchronization. Powered by a real-time computing engine, this feature automates the initial full data load and seamlessly transitions to continuous Change Data Capture (CDC). This feature provides a one-stop solution for scenarios such as real-time database migration to the cloud and building the Operational Data Store (ODS) layer of a real-time data warehouse.
Use cases
Build a real-time data warehouse ODS layer
Synchronize data in real time from online transaction processing (OLTP) databases, such as MySQL or Oracle, to a real-time data warehouse like Hologres or StarRocks. This provides data support for business intelligence (BI) dashboards, ad hoc queries, and other applications.
Enable real-time database replication and disaster recovery
Create a real-time replication task between two database instances. You can use this for read/write splitting, creating read-only instances, or implementing real-time disaster recovery for homogeneous or heterogeneous databases.
Perform real-time data migration to the cloud
Enables smooth migration of databases from an on-premises data center to cloud database services.
Build a real-time data lake or data middle platform
Collect real-time change data from multiple business databases into a data lake, such as Object Storage Service (OSS) or Data Lake Formation (DLF), or a data warehouse, such as MaxCompute or Hologres, to build a unified, real-time Data Middle Platform for your enterprise.
Core features
The core features of real-time database synchronization include:
Core feature | Specific feature | Description |
Database synchronization between heterogeneous data sources | - | You can synchronize an entire database from an on-premises data center or a third-party cloud to a data warehouse or data lake, such as MaxCompute, Hologres, or Kafka. For more information, see Supported data sources and synchronization solutions. |
Data synchronization in complex network environments | - | Real-time synchronization supports Alibaba Cloud databases, databases in an on-premises data center, self-managed databases on ECS, and third-party cloud databases. Before you begin, ensure that the resource group can connect to the source and destination. For more information, see Configure network connections. |
Synchronization scenarios | Full synchronization | Synchronizes all data from the source to the destination table in a single operation. |
Incremental synchronization | Captures streaming data from sources such as a message queue or CDC logs and writes the data to the destination table or a specified partition in real time. | |
Full and incremental synchronization |
| |
Task configuration | Batch table synchronization | You can synchronize all tables in a database or select specific tables by using checkboxes or configuring filter rules. |
Automatic table creation | A single task configuration handles hundreds of tables in the source database. The system automatically creates the table schemas in the destination, eliminating the need for manual intervention. | |
Flexible mapping | You can define custom naming rules for destination databases and tables. You can also define custom mappings for field data types between the source and destination to accommodate the destination's data model. | |
DDL change awareness (Supported by some tasks) | When a source table schema changes (for example, a table or column is created or deleted), you can configure the synchronization task to respond with one of the following policies:
| |
DML rule configuration | You can use DML message processing to filter and control change data captured from the source, such as | |
Dynamic partitioning | If the destination table is a partitioned table, you can enable dynamic partitioning based on a source field or the event time of a source change. Important Creating too many partitions can affect synchronization performance. If more than 1,000 new partitions are added in a single day, partition creation fails and the task is terminated. | |
Task O&M | Online intervention | Tasks support resumption from checkpoint, allowing them to resume from a specific time point after an interruption to ensure no data is lost. You can also rerun tasks to backfill data, fix anomalies, or validate logic changes, ensuring data consistency and business continuity. |
Monitoring and alerting | You can define monitoring rules for business latency, task status, failover, and DDL notifications and configure alerts to be triggered based on these rules. | |
Resource optimization | DataWorks Data Integration is based on a Serverless resource group and provides elastic scaling capabilities at the task level. You can also configure time-based elastic policies to preset different resource specifications for tasks during different periods, such as peak and off-peak business hours. |
Get started
To create a real-time database synchronization task, see Configure a real-time database synchronization task.
Supported data sources
Source | Destination |
MaxCompute | |
AnalyticDB for MySQL (V3.0) | |
ApsaraDB for OceanBase | |
Data Lake Formation (DLF) | |
DataHub | |
Doris | |
Elasticsearch | |
Hologres | |
Kafka | |
| LogHub |
Object Storage Service (OSS) | |
OSS-HDFS | |
SelectDB | |
StarRocks | |
Lindorm |