DataWorks provides the real-time data synchronization feature. You can use this feature to synchronize data changes of a table or all tables in a source to a destination in real time. This way, data in the destination remains consistent with data in the source in real time.

Limits

The real-time data synchronization feature is available in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Chengdu), and China East 2 Finance.

Architecture

Note The Groovy plug-in and synchronization to multiple destinations feature are in the canary release stage and will be supported soon.
The real-time data synchronization feature has the following benefits:
  • Diverse data sources

    Star-shaped combination is supported. Multiple supported sources can be combined with multiple supported destinations to synchronize data.

  • Synchronization solutions

    You can configure a synchronization solution to synchronize the full data and incremental data from a database.

  • Diverse synchronization methods

    You can synchronize data from table shards, a single table, or multiple tables in a database, and configure different real-time synchronization rules based on the messages about data definition language (DDL) operations.

  • Data processing

    You can perform data filtering or string replacement on the data from a source based on your business requirements and synchronize the processed data to a destination.

  • Monitoring and alerting

    This feature can send alert notifications about service latency, failovers, dirty data, heartbeats, and failures by emails, phone calls, and DingTalk messages. This way, you can detect and handle alerts at the earliest opportunity.

  • Graphical development

    You can perform drag-and-drop operations instead of writing code to develop real-time synchronization nodes. The feature is easy to use even for beginners.

Supported synchronization methods, sources, and destinations

The following table describes the sources and destinations that real-time synchronization nodes support.
Synchronization method Source Destination References for configuring data sources References for configuring synchronization nodes
Single-table synchronization
  • MySQL Binlog
  • DataHub
  • LogHub
  • Kafka
  • PolarDB
  • SQL Server
  • MaxCompute
  • Hologres
  • AnalyticDB MySQL
  • Elasticsearch
  • DataHub
  • Kafka
Configure and manage a real-time data synchronization node
Database synchronization
  • PolarDB MySQL
  • Oracle
  • MySQL
  • PolarDB-X
MaxCompute Configure and manage a real-time data synchronization node
  • PolarDB MySQL
  • Oracle
  • MySQL
  • PolarDB-X
Hologres Configure and manage a real-time data synchronization node
  • PolarDB MySQL
    Note Only PolarDB for MySQL is supported.
  • OceanBase
  • MySQL
  • Oracle
DataHub Configure and manage a real-time data synchronization node
MySQL Kafka Configure a data source (MySQL) Configure and manage a real-time data synchronization node

Resource usage and billing

Before you use a data synchronization node to synchronize data, you must purchase an exclusive resource group for data integration and add the resource group to DataWorks for subsequent use.

The following table lists the performance metrics of exclusive resource groups for Data Integration.
Specification Maximum concurrent threads for batch synchronization Maximum real-time sync nodes
4c8g 8 3
8c16g 16 6
12c24g 24 9
16c32g 32 12
24c48g 48 18
For information about the pricing of exclusive resource groups for Data Integration in different regions, see Billing standards of exclusive resource groups for Data Integration. The actual prices on the buy page prevail.

You can estimate the required resources and purchase an exclusive resource group for Data Integration based on the amount of data that you want to synchronize. For more information about exclusive resource groups for Data Integration, see Exclusive resource groups for Data Integration.

Network connectivity solutions

For more information about network connectivity solutions, see Overview. This section describes the solutions that can be used to connect a data source to an exclusive resource group.

An exclusive resource group for Data Integration is essentially a group of ECS instances. After you purchase such an exclusive resource group, it is isolated from other services. You must associate the resource group with a virtual private cloud (VPC) to ensure network connectivity between the resource group and data sources during subsequent data synchronization.

The network connectivity solutions vary based on the network environments of the data sources.
  • The data source is deployed on the Internet.

    Connect the data source to the virtual private cloud (VPC) that is associated with the exclusive resource group.

  • The data source is deployed in a VPC that is in the same region as the exclusive resource group.
    • Same zone: Associate the VPC where the data source resides with the exclusive resource group.
    • Different zones: Associate a VPC with the exclusive resource group. Then, configure a route between the associated VPC and the VPC where the data source resides.
  • The data source is deployed in a VPC that is in a different region from the region where the exclusive resource group resides.
    • Associate a VPC with the exclusive resource group. Then, configure a route between the associated VPC and the VPC where the data source resides.
    • Associate a VPC with the exclusive resource group. Then, use Express Connect or VPN Gateway to connect the associated VPC to the VPC where the data source resides.
  • The data source is deployed in a data center.
    • Associate a VPC with the exclusive resource group. Then, configure a route between the associated VPC and the network where the data center is located.
    • Associate a VPC with the exclusive resource group. Then, use Express Connect or VPN Gateway to connect the data center to the associated VPC.
  • The data source is deployed in the Alibaba Cloud classic network.

    The classic network and VPCs cannot be connected. Therefore, we recommend that you migrate the data source to a VPC.

Procedure

To use a synchronization solution of DataWorks, perform the following steps:
  1. Plan and configure resources.

    Estimate the required resources and purchase an exclusive resource group for Data Integration based on the amount of data that you want to synchronize and the network environment. Configure the resources to ensure network connectivity.

  2. Configure data sources.

    After you establish network connections for data sources between which you want to synchronize data, configure the data sources to ensure accessibility. For example, make sure that the IP addresses of the exclusive resource groups are added to the IP address whitelists of the data sources. Otherwise, the synchronization fails.

  3. Add data sources.

    Add the data sources to DataWorks as the source and destination. This way, you can associate the data sources when you create a synchronization solution.

  4. Create and configure a synchronization solution.

    Create a synchronization solution and configure the parameters based on the synchronization scenario.

For more information about the synchronization between data sources, see the following topics: