Supported Cloud Products & Compute Engines in DataWorks - DataWorks

DataWorks is a one-stop platform for big data development and governance. It integrates with compute engines to run data processing tasks, and connects to data sources to move data in and out of those engines. This topic lists the compute engines and data sources that DataWorks supports.

Compute engine ecosystem

DataWorks does not execute computing tasks directly. Instead, it uses an engine binding mechanism: bind computing resources to register an engine with the platform, then create, orchestrate, and manage data processing tasks from a unified interface.

The following compute engines are supported:

Engine	Typical use case
MaxCompute	Large-scale offline batch processing
Hologres	Real-time interactive queries on large datasets
Flink	Real-time stream processing
EMR on ECS	Open-source big data workloads (Hadoop, Spark, Hive) on ECS
EMR on ACK	Container-based open-source big data workloads
EMR Serverless StarRocks	Serverless real-time analytics with StarRocks
EMR Serverless Spark	Serverless Spark jobs without cluster management
CDH	On-premises Cloudera Hadoop clusters
AnalyticDB for MySQL	Cloud-native data warehousing compatible with MySQL
AnalyticDB for PostgreSQL	Massively parallel processing (MPP) analytics
AnalyticDB for Spark	Spark workloads integrated with AnalyticDB
OpenSearch	Full-text search and intelligent search
ClickHouse	High-performance OLAP and real-time reporting
Lindorm	Multi-model storage for IoT and time series data

Data source ecosystem

A data source is the unified entry point in DataWorks for connecting to external systems. Configure the connection information and network settings once in Management Center, then reuse the connection across Data Integration, Data Studio, Data Map, DataAnalysis, and DataService Studio — without repeating the configuration. In standard mode, you can also configure data source isolation to keep development and production environments physically separate.

The subsections below list the data sources supported by each DataWorks module.

Module support overview

Use this matrix to check which modules support a specific data source. See the relevant subsections below for links to setup guides.

Data source	Data Integration	Data Studio	Data Map	DataAnalysis	DataService Studio
MaxCompute	✓		✓	✓	✓
Hologres	✓		✓	✓
MySQL	✓	✓	✓	✓
PostgreSQL	✓	✓	✓	✓
Oracle	✓	✓	✓	✓
SQL Server	✓	✓	✓	✓
AnalyticDB for MySQL	✓		✓	✓	✓
AnalyticDB for PostgreSQL	✓		✓	✓	✓
StarRocks	✓	✓	✓	✓	✓
ClickHouse	✓		✓	✓
Doris	✓	✓		✓	✓
PolarDB	✓	✓			✓
SelectDB	✓	✓		✓
OceanBase	✓	✓			✓
Tablestore	✓		✓	✓
Tablestore Stream	✓				✓
Lindorm	✓		✓
HBase	✓				✓
Kafka	✓
Object Storage Service (OSS)	✓
Simple Log Service (SLS) / LogHub	✓
DataHub	✓
HDFS	✓
Amazon S3	✓
Azure Blob Storage	✓
BigQuery	✓
Amazon Redshift	✓	✓
Elasticsearch	✓
MongoDB	✓				✓
Redis	✓
Maxgraph	✓
EMR (Hive, Spark SQL, Impala, Presto, Trino)				✓
CDH (Hive, Spark SQL)			✓	✓
Data Lake Formation (DLF)	✓		✓
SAP HANA	✓	✓			✓
DB2	✓	✓			✓
DM	✓	✓			✓
DRDS (PolarDB-X 1.0)	✓	✓
PolarDB-X 2.0	✓
MariaDB	✓	✓
KingbaseES	✓	✓
Vertica	✓	✓
GBase8a	✓	✓
Milvus	✓
TiDB	✓
FTP	✓
HttpFile	✓
RestAPI (HTTP)	✓
Salesforce	✓
Sensors Data	✓
Memcache (OCS)	✓
MetaQ	✓
OSS-HDFS	✓
TOS	✓
TSDB	✓
Graph Database (GDB)	✓
AnalyticDB for Spark			✓
E-MapReduce HIVE			✓

The table above covers the data sources listed in this topic. For the full list of supported data sources and synchronization methods, see Supported data sources and synchronization solutions.

Data Integration

Data Integration is the primary module for moving data between systems. Configure a data source once in Management Center, then use it to set up sync tasks — choose single-table or full-database scope, and offline or real-time mode. Supported sync patterns include full migration, incremental capture (CDC), and automatic full-and-incremental synchronization.

For setup instructions, see Data source management and Supported data sources and synchronization solutions.

Cloud storage

OSS data source	Amazon S3 data source	Azure Blob Storage data source
FTP data source	HDFS data source	OSS-HDFS data source
HttpFile data source	TOS data source

Databases

MySQL data source	PostgreSQL data source	Oracle data source
SQL Server data source	PolarDB data source	PolarDB-X 2.0 data source
DRDS (PolarDB-X 1.0) data source	MariaDB data source	Vertica data source
DB2 data source	DM data source	GBase8a data source
KingbaseES data source	TiDB data source	ApsaraDB for OceanBase data source
Amazon Redshift data source	BigQuery data source

Alibaba Cloud data stores

MaxCompute data source	Hologres data source	AnalyticDB for MySQL 2.0 data source
AnalyticDB for MySQL 3.0 data source	AnalyticDB for PostgreSQL data source	ClickHouse data source
Lindorm data source	Tablestore data source	Tablestore Stream data source
LogHub (SLS) data source	DataHub data source	Maxgraph data source
OpenSearch data source	Data Lake Formation data source	Memcache (OCS) data source
MetaQ data source	SelectDB data source	Graph Database (GDB) data source

Big data and open-source systems

Hive data source	HBase data source	Kafka data source
StarRocks data source	Doris data source	Milvus data source

NoSQL, APIs, and SaaS

MongoDB data source	Redis data source	SAP HANA data source
Elasticsearch data source	RestAPI (HTTP) data source	Salesforce data source
Sensors Data source	TSDB data source

Data Studio

Data Studio supports hybrid orchestration and scheduling across compute engines and databases. In addition to engines such as MaxCompute, E-MapReduce (EMR), and AnalyticDB, you can connect databases directly as nodes in your development pipeline. Configure data source connections and scheduling policies once, then call them from the development and O&M modules.

For more information, see Database nodes.

MySQL data source	PolarDB MySQL data source	SAP HANA data source
SQL Server data source	PolarDB PostgreSQL data source	Vertica data source
Oracle data source	Doris data source	DM data source
PostgreSQL data source	MariaDB data source	KingbaseES data source
StarRocks data source	SelectDB data source	OceanBase data source
DRDS data source	Amazon Redshift data source	DB2 data source
GBase8a data source

Data Map

Data Map uses pre-configured data source connections to collect metadata automatically. The built-in collector retrieves database table schemas, partition information, and cross-system data lineage. After collection, view table details and visualize the lineage graph in Data Map to trace the origin and flow of your data assets.

For more information, see Metadata acquisition.

AnalyticDB for PostgreSQL data source	MySQL data source	Hologres data source
AnalyticDB for MySQL data source	PostgreSQL data source	Lindorm data source
AnalyticDB for Spark data source	SQL Server data source	MaxCompute data source
CDH Hive data source	Oracle data source	StarRocks data source
Data Lake Formation (DLF)	Tablestore (OTS) data source	ClickHouse data source
E-MapReduce HIVE data source

DataAnalysis

DataAnalysis lets you query, analyze, transform, and visualize data interactively using the engines and data sources registered in DataWorks.

For more information, see SQL query and analysis.

MaxCompute data source	Hologres data source	EMR Hive data source
EMR Spark SQL data source	EMR Impala data source	EMR Presto data source
EMR Trino data source	CDH Hive data source	CDH Spark SQL data source
StarRocks data source	ClickHouse data source	SelectDB data source
Doris data source	AnalyticDB for MySQL 3.0 data source	AnalyticDB for PostgreSQL data source
Tablestore (OTS) data source	MySQL data source	PostgreSQL data source
Oracle data source	SQL Server data source

DataService Studio

DataService Studio generates APIs from data sources, exposing data as standard service endpoints for sharing across teams and applications.

For more information, see Generate an API.

AnalyticDB for MySQL 2.0 data source	StarRocks data source	MaxCompute data source
AnalyticDB for MySQL 3.0 data source	Doris data source	HBase data source
AnalyticDB for PostgreSQL data source	PolarDB data source	DB2 data source
Tablestore Stream data source	ApsaraDB for OceanBase data source	DM data source
MongoDB data source	SAP HANA data source