All Products
Search
Document Center

DataWorks:Supported data sources and synchronization solutions

Last Updated:Feb 03, 2026

DataWorks Data Integration synchronizes data between a wide range of data sources, including MySQL, MaxCompute, Hologres, and Kafka. It provides solutions for offline batch synchronization, real-time data synchronization, and entire-database migration to support use cases like T+1 offline ETL, real-time data replication with latency in seconds, and database migration.

Synchronization solutions

Synchronization type

Source granularity

Destination granularity

Latency

Synchronization scenario

Batch synchronization for a single table

A single table

A single table or partition

T+1 or periodic

Periodic full synchronization, periodic incremental synchronization

Batch synchronization for sharded tables

Multiple tables with identical structures

A single table or partition

T+1 or periodic

Periodic full synchronization, periodic incremental synchronization

Real-time synchronization for a single table

A single table

A single table or partition

Seconds to minutes

Real-time incremental synchronization (Change Data Capture (CDC))

Real-time synchronization for sharded tables

Multiple logical tables (logical aggregations of multiple physical tables)

Map to multiple destination tables or MERGE into a single destination table

Seconds to minutes

Full synchronization + real-time incremental synchronization (CDC)

Batch synchronization for an entire database

An entire database or multiple tables

Map to multiple tables and their partitions

One-time or periodic

One-time/periodic full synchronization, one-time/periodic incremental synchronization, one-time full + periodic incremental synchronization

Real-time synchronization for an entire database

An entire database or multiple tables

Map to multiple tables and their partitions

Seconds to minutes

Full synchronization + real-time incremental synchronization (CDC)

Full and incremental synchronization for an entire database

An entire database or multiple tables

Map to multiple tables and their partitions

Initial full synchronization: Batch

Subsequent incremental synchronization: T+1

One-time full + T+1 incremental synchronization (using a Base+Log pattern)

Recommended synchronization solutions

When choosing a data synchronization solution, consider two key factors:

  1. Latency requirement: How often does your business need data to be synchronized? Is a daily update (batch) sufficient, or do you need real-time updates with second-to-minute latency?

  2. Scale and complexity: How many tables do you need to synchronize? Is the processing logic for these tables uniform (entire database) or varied (single table)?

Based on these factors, we recommend two main categories of solutions: batch synchronization and real-time synchronization.

1. Batch synchronization (T+1/periodic)

Batch solutions are suitable for scenarios that do not require high data freshness (for example, T+1) and involve periodic batch processing.

Important

Prerequisite: To perform incremental batch synchronization, the source table must have a watermark field that defines incremental data, such as a timestamp like gmt_modified or an auto-incrementing ID. Without such a field, you can only perform periodic full synchronization.

a. Batch synchronization for a single table

Use this option for fine-grained processing of a small number of critical or heterogeneous data sources.

  • Key advantage: Flexible processing logic.

    • Fine-grained transformations: Supports complex field mapping, data filtering, constant assignment, function-based transformations, and even AI-assisted processing.

    • Heterogeneous source integration: The best choice for processing non-standard data sources like APIs and log files.

  • Key limitation: High cost at scale.

    • High configuration overhead: Configuring and maintaining individual tasks for a large number of tables requires significant effort.

    • High resource consumption: Because each task is scheduled independently, 100 single-table tasks consume far more resources than one entire-database task.

Recommended solution: Create a batch synchronization task
b. Batch synchronization for an entire database

Use this option to efficiently migrate a large number of homogeneous tables from one location to another.

  • Key advantages: High O&M efficiency and low cost.

    • High efficiency: Configure hundreds of tables at once with automatic object matching, greatly improving development efficiency.

    • Cost-effective: Resources are scheduled and optimized together, resulting in extremely low costs. For example, the resource consumption of one entire-database task might be 2 CUs, while 100 single-table tasks could consume 100 CUs.

    • Typical use cases: Building the ODS layer of a data warehouse, performing periodic database backups, and migrating data to the cloud.

  • Key limitation: Limited processing logic.

    • Designed primarily for "copying" data and does not support complex, table-specific transformation logic.

Recommended solution: Configure a solution task for data synchronization.

2. Real-time synchronization (second-to-minute latency)

Real-time solutions are suitable for use cases that require capturing real-time data changes (inserts, updates, and deletes) from the source to support real-time analytics and business responses.

Important

Prerequisite: The source must support Change Data Capture (CDC) or be a message queue. For example, a MySQL source must have its binary log enabled, or the source must be a Kafka instance.

Real-time: single table vs. entire database

The decision logic is similar to that for batch solutions:

  • Real-time synchronization for a single table: Best for complex processing of real-time change streams from a single, critical table.

  • Real-time synchronization for an entire database: The primary choice for building real-time data warehouses, implementing real-time disaster recovery for databases, and enabling real-time data lakes. It offers the same significant advantages in efficiency and cost-effectiveness.

Recommended solutions:Create a real-time synchronization task, Configure a solution task for data synchronization

3. Real-time CDC to append-only destinations

Important

The challenge: Real-time synchronization captures CDC data, which includes Insert, Update, and Delete operations. For append-only storage systems like MaxCompute non-Delta tables that do not natively support physical Update or Delete operations, writing a raw CDC stream can cause data inconsistencies. For example, a delete operation would not be reflected.

  • DataWorks solution: The Base + Log pattern

    • This solution is implemented as the full and incremental synchronization for an entire database task. It solves the problem by creating a Base table (for the full snapshot) and a Log table (for incremental changes) at the destination.

    • How it works: The CDC data stream is written to the Log table in real time. Then, on a T+1 schedule, a system-scheduled task automatically applies the changes from the Log table to the Base table using a MERGE operation, creating an updated full snapshot. This approach provides minute-level latency for ingesting incremental data, while the final, consistent state becomes visible on a T+1 basis. It balances real-time data capture with the eventual consistency required by offline data warehouses.

Recommended solution: Configure a solution task for data synchronization.

Data source capabilities

Data source

Batch (single table)

Real-time (single table)

Batch (database)

Real-time (database)

Full and incremental (database)

ApsaraDB for Memcache data source

Read

-

-

-

-

ApsaraDB for MongoDB data source

Read/Write

-

-

-

-

ApsaraDB for PolarDB-X 1.0 data source

Read/Write

-

-

-

-

ApsaraDB for Redis data source

Read/Write

-

-

-

-

ApsaraDB RDS for MySQL data source

Read/Write

Write

Read

Write

-

ApsaraDB RDS for PostgreSQL data source

Read/Write

-

Read

-

-

ApsaraDB for OceanBase data source

Read/Write

-

-

Read

Read

ApsaraDB RDS for SQL Server data source

Read

-

-

-

-

ApsaraDB RDS for MariaDB TX data source

Read

-

-

-

-

AnalyticDB for MySQL data source

Read/Write

-

-

-

-

AnalyticDB for PostgreSQL data source

Read

-

-

-

-

API data source

Read

-

-

-

-

Data Lake Analytics (DLA) data source

Read/Write

Read/Write

-

Write

-

Doris data source

Read/Write

Write

Write

Write

-

DB2 for LUW data source

Read/Write

-

Read

-

-

SelectDB data source

Read/Write

Write

Read

-

-

DRDS data source

Read/Write

-

Read

-

-

Enterprise Distributed Application Service (EDAS) data source

Read/Write

-

Read

-

-

Read/Write

Write

Write

Write

-

EMR ClickHouse data source

Read/Write

-

-

-

-

Read/Write

-

-

-

-

Read/Write (Native API)

Read (SQL 2.0.x)

Write (SQL 1.1.x)

-

-

-

-

EMR HDFS data source

Read/Write

-

-

-

-

Read/Write

-

Read/Write

-

-

Hologres data source

Read/Write

Read/Write

Read/Write

Write

-

EMR Impala data source

Read

-

-

-

-

Kafka data source

Read/Write

Read/Write

-

Write

-

EMR Kudu data source

Read/Write

-

-

-

-

MaxCompute data source

Read/Write

Write

-

Write

-

Message Queue for Apache RocketMQ data source

Read/Write

Read

-

-

-

OpenSearch data source

Read/Write

Write

Write

Write

Write

Object Storage Service (OSS) data source

Read/Write

-

-

-

-

On-premises ApsaraDB for Lindorm data source

Write

-

-

-

-

On-premises Cassandra data source

Write

-

-

-

-

On-premises FTP data source

Read

-

-

-

-

On-premises Greenplum data source

Read/Write

-

-

-

-

On-premises HDFS data source

Read/Write

-

-

Read

-

On-premises MySQL data source

Read/Write

Read

Read

Read

Read

On-premises odpshttp data source

Write

-

-

-

-

On-premises Oracle data source

Read/Write

Read

Read

Read

Read

On-premises PolarDB for PostgreSQL (compatible with Oracle) data source

Read/Write

-

Write

Write

-

On-premises PolarDB-O data source

Read/Write

-

Write

Write

-

On-premises PostgreSQL data source

Read/Write

Read

Read

Read

Read

On-premises SQL Server data source

Read/Write

-

Read

Read

-

On-premises Teradata data source

Read/Write

-

Read

Read

-

On-premises Phoenix data source

Write

-

-

-

-

PolarDB for MySQL data source

Read/Write

-

-

-

-

PolarDB for PostgreSQL data source

Read/Write

-

-

-

-

PostGIS data source

Read/Write

-

-

-

-

SFTP data source

Write

-

-

-

-

StarRocks data source

Read/Write

-

-

-

-

Tablestore data source

Read/Write

Write

Write

Write

-

TDDW data source

Read/Write

-

Read

-

-

TIDB data source

Read/Write

Write

-

-

-

EMR Presto data source

Read/Write

-

-

-

-

Trino data source

Read/Write

-

-

-

-

VVP data source

Write

-

-

-

-

Read/Write

-

-

-

-

VPC ECS API data source

Read

-

-

-

-

Use cases

Related documentation

The following topics provide essential information to help you get started with Data Integration.