All Products
Search
Document Center

DataWorks:Product ecosystem

Last Updated:Mar 27, 2026

DataWorks is a one-stop platform for big data development and governance. It integrates with compute engines to run data processing tasks, and connects to data sources to move data in and out of those engines. This topic lists the compute engines and data sources that DataWorks supports.

Compute engine ecosystem

DataWorks does not execute computing tasks directly. Instead, it uses an engine binding mechanism: bind computing resources to register an engine with the platform, then create, orchestrate, and manage data processing tasks from a unified interface.

The following compute engines are supported:

Engine Typical use case
MaxCompute Large-scale offline batch processing
Hologres Real-time interactive queries on large datasets
Flink Real-time stream processing
EMR on ECS Open-source big data workloads (Hadoop, Spark, Hive) on ECS
EMR on ACK Container-based open-source big data workloads
EMR Serverless StarRocks Serverless real-time analytics with StarRocks
EMR Serverless Spark Serverless Spark jobs without cluster management
CDH On-premises Cloudera Hadoop clusters
AnalyticDB for MySQL Cloud-native data warehousing compatible with MySQL
AnalyticDB for PostgreSQL Massively parallel processing (MPP) analytics
AnalyticDB for Spark Spark workloads integrated with AnalyticDB
OpenSearch Full-text search and intelligent search
ClickHouse High-performance OLAP and real-time reporting
Lindorm Multi-model storage for IoT and time series data

Data source ecosystem

A data source is the unified entry point in DataWorks for connecting to external systems. Configure the connection information and network settings once in Management Center, then reuse the connection across Data Integration, Data Studio, Data Map, DataAnalysis, and DataService Studio — without repeating the configuration. In standard mode, you can also configure data source isolation to keep development and production environments physically separate.

The subsections below list the data sources supported by each DataWorks module.

Module support overview

Use this matrix to check which modules support a specific data source. See the relevant subsections below for links to setup guides.

Data source Data Integration Data Studio Data Map DataAnalysis DataService Studio
MaxCompute
Hologres
MySQL
PostgreSQL
Oracle
SQL Server
AnalyticDB for MySQL
AnalyticDB for PostgreSQL
StarRocks
ClickHouse
Doris
PolarDB
SelectDB
OceanBase
Tablestore
Tablestore Stream
Lindorm
HBase
Kafka
Object Storage Service (OSS)
Simple Log Service (SLS) / LogHub
DataHub
HDFS
Amazon S3
Azure Blob Storage
BigQuery
Amazon Redshift
Elasticsearch
MongoDB
Redis
Maxgraph
EMR (Hive, Spark SQL, Impala, Presto, Trino)
CDH (Hive, Spark SQL)
Data Lake Formation (DLF)
SAP HANA
DB2
DM
DRDS (PolarDB-X 1.0)
PolarDB-X 2.0
MariaDB
KingbaseES
Vertica
GBase8a
Milvus
TiDB
FTP
HttpFile
RestAPI (HTTP)
Salesforce
Sensors Data
Memcache (OCS)
MetaQ
OSS-HDFS
TOS
TSDB
Graph Database (GDB)
AnalyticDB for Spark
E-MapReduce HIVE
The table above covers the data sources listed in this topic. For the full list of supported data sources and synchronization methods, see Supported data sources and synchronization solutions.

Data Integration

Data Integration is the primary module for moving data between systems. Configure a data source once in Management Center, then use it to set up sync tasks — choose single-table or full-database scope, and offline or real-time mode. Supported sync patterns include full migration, incremental capture (CDC), and automatic full-and-incremental synchronization.

For setup instructions, see Data source management and Supported data sources and synchronization solutions.

Cloud storage

OSS data source

Amazon S3 data source

Azure Blob Storage data source

FTP data source

HDFS data source

OSS-HDFS data source

HttpFile data source

TOS data source

Databases

MySQL data source

PostgreSQL data source

Oracle data source

SQL Server data source

PolarDB data source

PolarDB-X 2.0 data source

DRDS (PolarDB-X 1.0) data source

MariaDB data source

Vertica data source

DB2 data source

DM data source

GBase8a data source

KingbaseES data source

TiDB data source

ApsaraDB for OceanBase data source

Amazon Redshift data source

BigQuery data source

Alibaba Cloud data stores

MaxCompute data source

Hologres data source

AnalyticDB for MySQL 2.0 data source

AnalyticDB for MySQL 3.0 data source

AnalyticDB for PostgreSQL data source

ClickHouse data source

Lindorm data source

Tablestore data source

Tablestore Stream data source

LogHub (SLS) data source

DataHub data source

Maxgraph data source

OpenSearch data source

Data Lake Formation data source

Memcache (OCS) data source

MetaQ data source

SelectDB data source

Graph Database (GDB) data source

Big data and open-source systems

Hive data source

HBase data source

Kafka data source

StarRocks data source

Doris data source

Milvus data source

NoSQL, APIs, and SaaS

MongoDB data source

Redis data source

SAP HANA data source

Elasticsearch data source

RestAPI (HTTP) data source

Salesforce data source

Sensors Data source

TSDB data source

Data Studio

Data Studio supports hybrid orchestration and scheduling across compute engines and databases. In addition to engines such as MaxCompute, E-MapReduce (EMR), and AnalyticDB, you can connect databases directly as nodes in your development pipeline. Configure data source connections and scheduling policies once, then call them from the development and O&M modules.

For more information, see Database nodes.

MySQL data source

PolarDB MySQL data source

SAP HANA data source

SQL Server data source

PolarDB PostgreSQL data source

Vertica data source

Oracle data source

Doris data source

DM data source

PostgreSQL data source

MariaDB data source

KingbaseES data source

StarRocks data source

SelectDB data source

OceanBase data source

DRDS data source

Amazon Redshift data source

DB2 data source

GBase8a data source

Data Map

Data Map uses pre-configured data source connections to collect metadata automatically. The built-in collector retrieves database table schemas, partition information, and cross-system data lineage. After collection, view table details and visualize the lineage graph in Data Map to trace the origin and flow of your data assets.

For more information, see Metadata acquisition.

AnalyticDB for PostgreSQL data source

MySQL data source

Hologres data source

AnalyticDB for MySQL data source

PostgreSQL data source

Lindorm data source

AnalyticDB for Spark data source

SQL Server data source

MaxCompute data source

CDH Hive data source

Oracle data source

StarRocks data source

Data Lake Formation (DLF)

Tablestore (OTS) data source

ClickHouse data source

E-MapReduce HIVE data source

DataAnalysis

DataAnalysis lets you query, analyze, transform, and visualize data interactively using the engines and data sources registered in DataWorks.

For more information, see SQL query and analysis.

MaxCompute data source

Hologres data source

EMR Hive data source

EMR Spark SQL data source

EMR Impala data source

EMR Presto data source

EMR Trino data source

CDH Hive data source

CDH Spark SQL data source

StarRocks data source

ClickHouse data source

SelectDB data source

Doris data source

AnalyticDB for MySQL 3.0 data source

AnalyticDB for PostgreSQL data source

Tablestore (OTS) data source

MySQL data source

PostgreSQL data source

Oracle data source

SQL Server data source

DataService Studio

DataService Studio generates APIs from data sources, exposing data as standard service endpoints for sharing across teams and applications.

For more information, see Generate an API.

AnalyticDB for MySQL 2.0 data source

StarRocks data source

MaxCompute data source

AnalyticDB for MySQL 3.0 data source

Doris data source

HBase data source

AnalyticDB for PostgreSQL data source

PolarDB data source

DB2 data source

Tablestore Stream data source

ApsaraDB for OceanBase data source

DM data source

MongoDB data source

SAP HANA data source