All Products
Search
Document Center

DataWorks:Product ecosystem

Last Updated:Nov 14, 2025

As Alibaba Cloud's one-stop platform for big data development and governance, DataWorks is often used with compute engine products. For data integration, DataWorks also works with data source products to enable data transmission. This topic describes other cloud products that are commonly used with DataWorks in typical scenarios.

Compute engine product ecosystem

DataWorks provides an open compute engine ecosystem. It integrates with mainstream engines such as MaxCompute, EMR, Hologres, and Flink to support collaborative development across engines. You can bind computing resources to convert them into computing resources available on the platform. This enables one-stop big data development and governance. As a one-stop platform, DataWorks does not directly execute computing tasks. Instead, it uses an engine binding mechanism that allows developers to create, orchestrate, and manage data processing tasks from a unified interface.

Currently, DataWorks supports the following compute engines:

MaxCompute

Hologres

Flink

EMR on ECS

EMR on ACK

EMR Serverless StarRocks

EMR Serverless Spark

CDH

AnalyticDB for MySQL

AnalyticDB for PostgreSQL

AnalyticDB for Spark

OpenSearch

ClickHouse

Lindorm

Data source product ecosystem

A data source is the unified entry point in DataWorks for connecting to external systems. It supports standardized access to disparate data sources, such as databases, big data storage, and message queues. You can define the connection information and configure network connectivity once in the Management Center. Then, you can call it from multiple product modules to avoid repetitive configuration. In standard mode, you can also configure data source isolation for development and production environments to ensure physical isolation.

Data Integration

A data source is a standard unit in DataWorks for connecting to external systems. It provides connection templates for disparate data sources, such as MaxCompute, MySQL, and OSS, which offer unified read and write endpoints for data integration tasks. Based on this configuration, the Data Integration module lets you flexibly select synchronization methods in a unified interface. These methods include single table or full database and offline or real-time sync. This enables data ingestion with full migration, incremental capture, and automatic full and incremental synchronization.

For more information, see Data Source Management and Supported data sources and synchronization solutions.

Amazon S3 data source

HDFS data source

PolarDB data source

Amazon Redshift data source

Hive data source

PolarDB-X 2.0 data source

AnalyticDB for MySQL 2.0 data source

Hologres data source

PostgreSQL data source

AnalyticDB for MySQL 3.0 data source

HttpFile data source

Redis data source

AnalyticDB for PostgreSQL data source

Kafka data source

RestAPI (HTTP) data source

ApsaraDB For OceanBase data source

KingbaseES data source

Salesforce data source

Azure Blob Storage data source

Lindorm data source

SAP HANA data source

BigQuery data source

LogHub (SLS) data source

SelectDB data source

ClickHouse data source

MaxCompute data source

Sensors Data data source

DataHub data source

MariaDB data source

StarRocks data source

Data Lake Formation data source

Maxgraph data source

SQL Server data source

DB2 data source

Memcache (OCS) data source

Tablestore data source

Doris data source

MetaQ data source

Tablestore Stream data source

DM data source

Milvus data source

TiDB data source

DRDS (PolarDB-X 1.0) data source

MongoDB data source

TSDB data source

Elasticsearch data source

MySQL data source

Vertica data source

FTP data source

OpenSearch data source

TOS data source

GBase8a data source

Oracle data source

HBase data source

Graph Database (GDB) data source

OSS data source

OSS-HDFS data source

Data Studio

DataWorks supports task development using disparate compute engines such as MaxCompute, EMR, and ADB as the underlying computing resources. You can also connect databases such as MySQL and Oracle to the development pipeline as nodes. You can configure data source connections and scheduling policies in the unified interface. Then, you can call them from modules such as development and O&M to achieve hybrid orchestration and scheduling across different engines and databases.

For more information, see Database nodes.

MySQL data source

PolarDB MySQL data source

Saphana data source

SQL Server data source

PolarDB PostgreSQL data source

Vertica data source

Oracle data source

Doris data source

DM data source

PostgreSQL data source

Mariadb data source

KingbaseES data source

StarRocks data source

Selectdb data source

OceanBase data source

DRDS data source

Redshift data source

DB2 data source

Gbase8a data source

Data Map

A data source is the basic unit that Data Map uses for unified metadata acquisition. Using the pre-configured data source connection, the system's built-in collector can obtain database table schemas, partition information, and cross-link data lineage. After acquisition, you can view table information and visualize the data lineage graph in Data Map. This lets you perform traceability analysis on your data assets.

For more information, see Metadata acquisition.

AnalyticDB for PostgreSQL data source

MySQL data source

Hologres data source

AnalyticDB for MySQL data source

PostgreSQL data source

Lindorm data source

AnalyticDB for Spark data source

SQL Server data source

MaxCompute data source

CDH Hive data source

Oracle data source

StarRocks data source

Data Lake Formation (DLF)

Tablestore (OTS) data source

Clickhouse data source

E-MapReduce HIVE data source

DataAnalysis

DataAnalysis uses engines and data sources to allow you to smoothly process, analyze, transform, and visualize data in DataWorks.

For more information, see SQL query and analysis.

MaxCompute data source

Hologres data source

EMR Hive data source

EMR Spark SQL data source

EMR Impala data source

EMR Presto data source

EMR Trino data source

CDH Hive data source

CDH Spark SQL data source

StarRocks data source

ClickHouse data source

SelectDB data source

Doris data source

AnalyticDB for MySQL 3.0 data source

AnalyticDB for PostgreSQL data source

Tablestore (OTS) data source

MySQL data source

PostgreSQL data source

Oracle

SQL Server data source

DataService Studio

DataService Studio can generate APIs to transform disparate data sources into standard data service capabilities, enabling data sharing.

For more information, see Generate an API.

AnalyticDB for MySQL 2.0 data source

StarRocks data source

MaxCompute data source

AnalyticDB for MySQL 3.0 data source

Doris data source

HBase data source

AnalyticDB for PostgreSQL data source

PolarDB data source

DB2 data source

Tablestore Stream data source

ApsaraDB For OceanBase data source

DM data source

MongoDB data source

SAP HANA data source