DataWorks provides a real-time data synchronization feature. You can use this feature to synchronize data changes from a single table or an entire database to a destination database in real time. This ensures that the destination database remains consistent with the source.
Core features
Real-time synchronization supports the features described in the following table.
Capabilities | Description |
Data synchronization between various data sources | Real-time synchronization supports various data sources. You can combine different input and output data sources to create a sync link. For more information, see Supported data sources and sync solutions. |
Data synchronization in complex network environments | Real-time synchronization supports data synchronization in environments such as Alibaba Cloud databases, on-premises data centers, self-managed databases on ECS instances, or databases outside Alibaba Cloud. Before you configure the task, ensure that the resource group can connect to the source and destination. For more information about the configuration, see Network connectivity solutions. |
Sync scenarios | Real-time synchronization supports synchronizing data in real time from a single table to a single destination table. It also supports synchronizing incremental data from sharded databases and tables to a single destination table.
|
Real-time sync task configuration | The following features are supported when you configure a real-time sync task. You can perform real-time ETL on data from a single table using simple configurations without writing code. For more information, see Configure a real-time sync task for a single table and Synchronize data from sharded databases and tables to MaxCompute. Real-time synchronization for a single table:
Real-time synchronization for sharded databases and tables:
|
Real-time sync task O&M | You can monitor sync tasks and configure alerts.
|
Real-time sync tasks cannot be run from the Data Studio page. You must save and submit the real-time sync node, and then run the node from the Operation Center in the production environment.
Real-time sync tasks do not support synchronizing views.
Supported data sources
The data sources supported by Data Studio and Data Integration partially overlap. If Data Integration supports the data source type that you need, we recommend that you create the real-time sync task in Data Integration.
The supported source and destination data sources in Data Integration have specific supported pairings. For information about supported combinations, refer to the Sync Type options that are available when you configure the source and destination data sources.
Data Studio
Source: MySQL, DataHub, LogHub, Kafka, and PolarDB.
Destination: MaxCompute, Hologres, AnalyticDB for MySQL 3.0, Elasticsearch, DataHub, and Kafka.
Data processing: Data filtering, string replacement, and data masking.
Data Integration
Source: Kafka, Hologres, Oracle, LogHub, and DataHub.
Destination: ApsaraDB for OceanBase, Data Lake Formation (DLF), Doris, Hologres, MaxCompute, OSS, OSS-HDFS, StarRocks, and Tablestore.
Data processing: Data filtering, string replacement, data masking, JSON parsing, and field editing and assignment.
Get started
To create a real-time sync task for a single table, see Configure a real-time sync task in DataStudio and Configure a real-time sync task in Data Integration.
To create a real-time sync task for sharded databases and tables, see Synchronize data from sharded databases and tables to MaxCompute.
FAQ
For answers to frequently asked questions about real-time sync tasks, see FAQ about real-time synchronization.