DataWorks provides the real-time data synchronization feature. You can use this feature to synchronize data changes of a table in a source database or of the entire source database to a destination database in real time. This way, the destination database has all the data in the source database and that data is up-to-date.
The real-time data synchronization feature is available in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Chengdu), and China East 2 Finance.
- Diverse data sources and destinations
Star-shaped combination is supported. Multiple data sources and destinations can be combined to synchronize data.
- Sync solutions
You can configure a synchronization solution to synchronize the full data and incremental data from a database.
- Diverse synchronization methods
You can synchronize data from table shards, a single table, and multiple tables in a database, and configure different real-time synchronization rules based on the messages about data definition language (DDL) operations.
- Data processing
You can perform data filtering or string replacement on the data from data sources based on your business requirements and synchronize the processed data to data destinations.
- Monitoring and alerting
This feature can send alert notifications about the service latency, failovers, dirty data, heartbeats, and failures to you by emails, phone calls, and DingTalk messages. This way, you can detect and handle the alerts at the earliest opportunity.
- Graphical development
You can develop real-time sync nodes by performing drag-and-drop operations. You do not need to write code. The feature is easy to use even for beginners.
Supported synchronization methods, data sources, and destinations
|Synchronization method||Data source||Data destination||References for configuring data sources||References for configuring sync nodes|
||Hologres||Configure and manage a real-time data synchronization node|
||DataHub||Configure and manage a real-time data synchronization node|
Resource usage and billing
Before you use a data synchronization node to synchronize data, you must purchase an exclusive resource group for data integration and add the resource group to DataWorks for subsequent use.
|Specification||Maximum concurrent threads for batch synchronization||Maximum real-time sync nodes|
You can estimate the required resources and purchase an exclusive resource group for Data Integration based on the amount of the data that you want to synchronize. For more information about exclusive resource groups for Data Integration, see Exclusive resources for Data Integration.
Network connectivity solutions
For more information about network connectivity solutions, see Solutions for ensuring connectivity in the data store dimension. This section describes the solutions that can be used to connect a data source to an exclusive resource group.
- The data source is deployed on the Internet.
Connect the Internet environment where the data source resides to the virtual private cloud (VPC) that is associated with the exclusive resource group.
- The data source is deployed in a VPC that is in the same region as the exclusive resource
- Same zone: Associate the VPC where the data source resides with the exclusive resource group.
- Different zones: Associate a VPC with the exclusive resource group. Then, configure a route between the associated VPC and the VPC where the data source resides.
- The data source is deployed in a VPC that is in a different region from the exclusive
- Associate a VPC with the exclusive resource group. Then, configure a route between the associated VPC and the VPC where the data source resides.
- Associate a VPC with the exclusive resource group. Then, use Express Connect or VPN Gateway to connect the associated VPC to the VPC where the data source resides.
- The data source is deployed in a data center.
- Associate a VPC with the exclusive resource group. Then, configure a route between the associated VPC and the network where the data center is located.
- Associate a VPC with the exclusive resource group. Then, use Express Connect or VPN Gateway to connect the data center to the associated VPC.
- The data source is deployed in the Alibaba Cloud classic network.
The classic network and VPCs cannot be connected. Therefore, we recommend that you migrate the data source to a VPC.
- Plan and configure resources.
Estimate the required resources and purchase an exclusive resource group for Data Integration based on the amount of the data that you want to synchronize and the network environment. Configure the resources to ensure network connectivity.
- Configure data sources.
After network connections are established for data sources between which you want to synchronize data, configure the data sources to ensure accessibility. For example, make sure that the IP addresses of the exclusive resource groups are added to the IP address whitelists of the data sources. Otherwise, the synchronization fails.
- Add the data sources.
Add the data sources to DataWorks as the source and destination. This way, you can associate the data sources when you create a synchronization solution.
- Create and configure a synchronization solution.
Create a synchronization solution and configure parameters based on the synchronization scenario.