DataWorks Data Integration replicates entire databases or selected tables from a source to a destination using integrated full and incremental (CDC) synchronization. It automatically performs a full data load and then continuously captures incremental changes, supporting use cases such as real-time cloud migration and ODS layer construction.
Use cases
-
Build a real-time ODS Layer for your data warehouse
Synchronize data from operational databases such as MySQL and Oracle to a real-time data warehouse like Hologres or StarRocks, providing fresh data for dashboards and ad hoc queries.
-
Real-time database replication for disaster recovery
Establish a real-time replication link between two database instances for read/write splitting, read-only replicas, or disaster recovery across homogeneous or heterogeneous databases.
-
Real-time data migration to the cloud
Seamlessly migrate databases from an on-premises data center to a cloud database service.
-
Build a real-time Data Lake or Data Middle Platform
Collect real-time change data from multiple operational databases into a central data lake (OSS or DLF) or data warehouse (MaxCompute or Hologres) to build a unified real-time Data Middle Platform.
Core capabilities
Real-time database synchronization provides the following capabilities:
|
Core capability |
Feature |
Description |
|
Synchronize entire databases between heterogeneous data sources |
- |
Migrates data from on-premises data centers or other cloud platforms to a data warehouse or data lake such as MaxCompute, Hologres, or Kafka. Supported data sources and synchronization solutions. |
|
Synchronize data in complex network environments |
- |
Supports data from Alibaba Cloud database services, on-premises data centers, self-managed databases on ECS, and other cloud providers. Ensure network connectivity between the resource group and your source and destination. Configure network connections. |
|
Synchronization scenarios |
Full synchronization |
Performs a one-time synchronization of all data from the source to the destination table. |
|
Incremental synchronization |
Captures streaming data in real time from sources like message queues or CDC logs, and writes it to a destination table or a specified Partition. |
|
|
Integrated full and incremental synchronization |
|
|
|
Task configuration |
Batch table synchronization |
Synchronize all tables or select a subset using checkboxes or filter rules. |
|
Automatic table creation |
A single configuration processes hundreds of source tables. The system automatically creates the corresponding schema in the destination. |
|
|
Flexible mapping |
Define custom naming conventions for destination databases and tables, and customize data type mapping between source and destination. |
|
|
DDL Change Awareness (Supported on select paths) |
When the source schema changes (such as table or column creation or deletion), configure the task to respond in one of these ways:
|
|
|
DML rules |
Controls how change data ( |
|
|
Dynamic partitioning |
If the destination table is partitioned, enable dynamic partitioning based on a source field or event timestamp. Important
Creating an excessive number of partitions can degrade synchronization performance. If more than 1,000 new partitions are created in a single day, partition creation fails and the task terminates. |
|
|
Task O&M |
Online intervention |
Resume tasks from a specific checkpoint after an interruption to prevent data loss. Rerun tasks for data backfilling, exception handling, or logic validation. |
|
Monitoring and alerting |
Configure monitoring rules for business latency, task status, failover events, and DDL notifications to trigger alerts. |
|
|
Resource optimization |
DataWorks Data Integration supports task-level elastic scaling with the Serverless Resource Group. Configure time-based elasticity policies to automatically adjust resource specifications for peak and off-peak hours. |
Get started
Supported data sources
|
Source |
Destination |
|
MaxCompute |
|
|
AnalyticDB for MySQL (V3.0) |
|
|
ApsaraDB for OceanBase |
|
|
Data Lake Formation (DLF) |
|
|
DataHub |
|
|
Doris |
|
|
Elasticsearch |
|
|
Hologres |
|
|
Kafka |
|
|
LogHub |
|
Object Storage Service (OSS) |
|
|
OSS-HDFS |
|
|
SelectDB |
|
|
StarRocks |
|
|
Lindorm |