All Products
Search
Document Center

DataWorks:Supported data sources and synchronization solutions

Last Updated:May 09, 2026

DataWorks Data Integration supports data synchronization between multiple data sources, such as MySQL, MaxCompute, Hologres, and Kafka. Data Integration offers batch processing, real-time data synchronization, and whole-database migration solutions. You can use these solutions for scenarios such as T+1 batch extract, transform, and load (ETL), real-time data replication with second-level latency, and whole-database migration.

Synchronization solutions

Type

Source granularity

Target granularity

Timeliness

Synchronization scenario

Single-table batch

A single table

A single table or partition

T+1 or periodic

Periodic full, periodic incremental

Sharded database and table batch

Multiple tables with identical structures

A single table or partition

T+1 or periodic

Periodic full, periodic incremental

Single-table real-time

A single table

A single table or partition

Seconds to minutes

Real-time incremental (CDC)

Whole-database batch

An entire database or multiple tables

Corresponding multiple tables and their partitions

One-time or periodic

One-time/periodic full, one-time/periodic incremental, one-time full + periodic incremental

Whole-database real-time

An entire database or multiple tables

Corresponding multiple tables and their partitions

Seconds to minutes

Full + real-time incremental (CDC)

Whole-database full-plus-incremental

An entire database or multiple tables

Corresponding multiple tables and their partitions

Initial full load: Batch processing

Subsequent incremental: T+1

Full + real-time incremental (CDC)

Recommended synchronization solutions

When selecting a data synchronization solution, consider two key questions:

  1. Timeliness requirements: How frequently does your business need data synchronized — once a day (batch), or with second-level or minute-level real-time updates (real-time)?

  2. Synchronization scale and complexity: How many tables need to be synchronized, and whether the processing logic is uniform across tables (single-table vs. whole-database)?

Based on these considerations, we recommend synchronization solutions in two categories: batch synchronization solutions and real-time synchronization solutions.

1. Choosing a batch synchronization solution (T+1/periodic)

Batch solutions are suitable for scenarios where data timeliness requirements are not high (for example, T+1) and periodic batch processing is needed.

Important

Key prerequisite: To implement batch incremental synchronization, source tables must contain columns that can be used to identify incremental data, such as timestamps like gmt_modified or auto-increment IDs. If such columns are unavailable, you can only fall back to periodic full synchronization.

1. Choose single-table batch

Use this when you need fine-grained processing of a small number of core, heterogeneous data sources.

  • Key advantages: Flexible processing logic.

    • Fine-grained transformation: Supports complex column mapping, data filtering, constant assignment, function-based transformations, and even AI-assisted processing.

    • Heterogeneous source integration: The best choice for non-standard data sources such as APIs and log files.

  • Key limitations: High cost at scale.

    • High configuration overhead: When synchronizing a large number of tables, configuring and maintaining tasks one by one involves significant effort.

    • High resource consumption: Each task is scheduled independently. The resource consumption of 100 single-table tasks far exceeds that of one whole-database task.

Single-table batch solution: Configure a single-table batch synchronization task
2. Choose whole-database batch

Use this when you need to efficiently migrate a large number of homogeneous tables from one location to another.

  • Key advantages: High O&M efficiency and low cost.

    • High efficiency: Configure hundreds of tables at once with automatic object matching, significantly improving development efficiency.

    • Cost-effective: Resources are scheduled and optimized holistically at a very low cost. For example, one whole-database task vs. 100 single-table tasks may consume 2 CUs vs. 100 CUs.

    • Typical scenarios: Building the ODS layer of a data warehouse, periodic database backup, and data migration to the cloud.

  • Key limitations: Limited processing logic.

    • Primarily designed for data replication and does not support complex transformation logic for individual tables.

Whole-database batch solution: Configure a whole-database batch synchronization task.

2. Choosing a real-time synchronization solution (seconds to minutes)

Real-time solutions are suitable for scenarios that require capturing real-time data changes (inserts, updates, and deletes) at the source to support real-time analytics and business responses.

Important

Key prerequisite: The source must support change data capture (CDC) or be a message queue. For example, MySQL requires Binlog to be enabled, or the source must be a Kafka instance.

Choose single-table real-time or whole-database real-time

The selection logic is similar to that of batch solutions:

  • Single-table real-time: Suitable for scenarios that require complex processing of real-time change streams from a single core table.

  • Whole-database real-time: The mainstream choice for building real-time data warehouses, implementing real-time database disaster recovery, and connecting to real-time data lakes. This option also provides significant advantages in efficiency and cost-effectiveness.

Real-time solutions: Configure a single-table real-time synchronization task, Configure a whole-database real-time synchronization task

3. Special scenario: Writing real-time CDC data to append-only target tables

Important

Background: CDC data captured by real-time synchronization includes three types of operations: Insert, Update, and Delete. For append-only storage systems that do not natively support Update/Delete operations at the physical level, such as non-Delta Table types in MaxCompute, directly writing the CDC stream causes data state inconsistencies (for example, delete operations cannot be reflected).

  • DataWorks solution: Base + Log mode

    • This solution uses the whole-database full-plus-incremental task and resolves this issue by creating a Base table (full snapshot) and a Log table (incremental log) at the target.

    • How it works: The CDC data stream is written in real time to the Log table. Then, on a T+1 basis, the system automatically schedules a task to merge the changes from the Log table into the Base table to generate the latest full snapshot. The timeliness of this solution is "incremental data written to the log table within minutes, with the final state merged and visible at T+1." It balances real-time data capture with eventual consistency for offline data warehouses.

Recommended solution: Configure a whole-database full-plus-incremental synchronization task.

Data source read/write capabilities

Data source

Single-table batch

Single-table real-time

Whole-database batch

Whole-database real-time

Whole-database full-plus-incremental

Public dataset data source

Read

-

-

-

-

Amazon S3 data source

Read/Write

-

-

-

-

Amazon Redshift data source

Read/Write

-

-

-

-

AnalyticDB for MySQL 2.0

Read/Write

-

-

-

-

AnalyticDB for MySQL 3.0 data source

Read/Write

Write

Read

Write

-

AnalyticDB for PostgreSQL data source

Read/Write

-

Read

-

-

ApsaraDB for OceanBase data source

Read/Write

Write

Read

Read/Write

Read

Azure Blob Storage data source

Read

-

-

-

-

BigQuery

Read

-

-

-

-

ClickHouse data source

Read/Write

-

Read

-

-

COS data source

Read

-

-

-

-

Databricks data source

Read

-

-

-

-

DataHub data source

Read/Write

Read/Write

-

Write

-

Data Lake Formation data source

Read/Write

Write

Write

Write

-

DB2

Read/Write

-

Read

-

-

Doris

Read/Write

Write

Read

-

-

DM (Dameng)

Read/Write

-

Read

-

-

DRDS (PolarDB-X 1.0)

Read/Write

-

Read

-

-

Elasticsearch

Read/Write

Write

Write

Write

-

FTP data source

Read/Write

-

-

-

-

GBase8a

Read/Write

-

-

-

-

HBase

hbase Read/Write

HBase 20xsql Read

HBase 11xsql Write

-

-

-

-

HDFS

Read/Write

-

-

-

-

Hive

Read/Write

-

Read/Write

-

-

Hologres data source

Read/Write

Read/Write

Read/Write

Write

-

HttpFile

Read

-

-

-

-

Kafka data source

Read/Write

Read/Write

-

Write

-

KingbaseES

Read/Write

-

-

-

-

Lindorm data source

Read/Write

Write

-

Write

-

LogHub (SLS)

Read/Write

Read

-

-

-

MaxCompute data source

Read/Write

Write

Write

Write

Write

MariaDB

Read/Write

-

-

-

-

Maxgraph

Write

-

-

-

-

Memcache (OCS)

Write

-

-

-

-

MetaQ data source

Read

-

-

-

-

Milvus

Read/Write

-

-

-

-

MongoDB

Read/Write

-

-

Read

-

MySQL data source

Read/Write

-

Read

Read

Read

OpenSearch

Write

-

-

-

-

Oracle data source

Read/Write

Read

Read

Read

Read

OSS

Read/Write

-

Write

Write

-

OSS-HDFS

Read/Write

-

Write

Write

-

PolarDB data source

Read/Write

-

Read

Read

Read

PolarDB-X 2.0

Read/Write

-

Read

Read

-

PostgreSQL

Read/Write

-

Read

Read

-

Redis

Write

-

-

-

-

RestAPI (HTTP)

Read/Write

-

-

-

-

Salesforce data source

Read/Write

-

-

-

-

SAP HANA

Read/Write

-

-

-

-

Sensors Data data source

Write

-

-

-

-

Snowflake

Read/Write

-

-

-

-

StarRocks data source

Read/Write

Write

Write

Write

-

SQL Server data source

Read/Write

-

Read

-

-

Tablestore data source

Read/Write

Write

-

-

-

TiDB data source

Read/Write

-

-

-

-

TSDB

Write

-

-

-

-

Vertica

Read/Write

-

-

-

-

TOS data source

Read

-

-

-

-

References

The following is a curated list of core Data Integration documents to help you get started quickly.