As Alibaba Cloud's one-stop platform for big data development and governance, DataWorks is often used with compute engine products. For data integration, DataWorks also works with data source products to enable data transmission. This topic describes other cloud products that are commonly used with DataWorks in typical scenarios.
Compute engine product ecosystem
DataWorks provides an open compute engine ecosystem. It integrates with mainstream engines such as MaxCompute, EMR, Hologres, and Flink to support collaborative development across engines. You can bind computing resources to convert them into computing resources available on the platform. This enables one-stop big data development and governance. As a one-stop platform, DataWorks does not directly execute computing tasks. Instead, it uses an engine binding mechanism that allows developers to create, orchestrate, and manage data processing tasks from a unified interface.
Currently, DataWorks supports the following compute engines:
Data source product ecosystem
A data source is the unified entry point in DataWorks for connecting to external systems. It supports standardized access to disparate data sources, such as databases, big data storage, and message queues. You can define the connection information and configure network connectivity once in the Management Center. Then, you can call it from multiple product modules to avoid repetitive configuration. In standard mode, you can also configure data source isolation for development and production environments to ensure physical isolation.
Data Integration
A data source is a standard unit in DataWorks for connecting to external systems. It provides connection templates for disparate data sources, such as MaxCompute, MySQL, and OSS, which offer unified read and write endpoints for data integration tasks. Based on this configuration, the Data Integration module lets you flexibly select synchronization methods in a unified interface. These methods include single table or full database and offline or real-time sync. This enables data ingestion with full migration, incremental capture, and automatic full and incremental synchronization.
For more information, see Data Source Management and Supported data sources and synchronization solutions.
Data Studio
DataWorks supports task development using disparate compute engines such as MaxCompute, EMR, and ADB as the underlying computing resources. You can also connect databases such as MySQL and Oracle to the development pipeline as nodes. You can configure data source connections and scheduling policies in the unified interface. Then, you can call them from modules such as development and O&M to achieve hybrid orchestration and scheduling across different engines and databases.
For more information, see Database nodes.
MySQL data source | PolarDB MySQL data source | Saphana data source |
SQL Server data source | PolarDB PostgreSQL data source | Vertica data source |
Oracle data source | Doris data source | DM data source |
PostgreSQL data source | Mariadb data source | KingbaseES data source |
StarRocks data source | Selectdb data source | OceanBase data source |
DRDS data source | Redshift data source | DB2 data source |
Gbase8a data source |
Data Map
A data source is the basic unit that Data Map uses for unified metadata acquisition. Using the pre-configured data source connection, the system's built-in collector can obtain database table schemas, partition information, and cross-link data lineage. After acquisition, you can view table information and visualize the data lineage graph in Data Map. This lets you perform traceability analysis on your data assets.
For more information, see Metadata acquisition.
AnalyticDB for PostgreSQL data source | MySQL data source | Hologres data source |
AnalyticDB for MySQL data source | PostgreSQL data source | Lindorm data source |
AnalyticDB for Spark data source | SQL Server data source | MaxCompute data source |
CDH Hive data source | Oracle data source | StarRocks data source |
Data Lake Formation (DLF) | Tablestore (OTS) data source | Clickhouse data source |
E-MapReduce HIVE data source |
DataAnalysis
DataAnalysis uses engines and data sources to allow you to smoothly process, analyze, transform, and visualize data in DataWorks.
For more information, see SQL query and analysis.
MaxCompute data source | Hologres data source | EMR Hive data source |
EMR Spark SQL data source | EMR Impala data source | EMR Presto data source |
EMR Trino data source | CDH Hive data source | CDH Spark SQL data source |
StarRocks data source | ClickHouse data source | SelectDB data source |
Doris data source | AnalyticDB for MySQL 3.0 data source | AnalyticDB for PostgreSQL data source |
Tablestore (OTS) data source | MySQL data source | PostgreSQL data source |
Oracle | SQL Server data source |
DataService Studio
DataService Studio can generate APIs to transform disparate data sources into standard data service capabilities, enabling data sharing.
For more information, see Generate an API.