DataWorks is an all-in-one big data development and governance platform for data engineers, data analysts, architects, and operations teams. Use it to integrate, develop, model, analyze, monitor, serve, and govern data across your organization — and build an enterprise-level data middle platform without switching between separate tools.
How data flows through DataWorks
Data moves through five stages in a typical DataWorks workflow:
Connect and collect — Data Integration syncs data from 50+ heterogeneous sources into your data lake or warehouse, without writing custom connectors or managing network infrastructure manually.
Transform and enrich — Data Studio lets you develop, test, and schedule processing tasks across multiple compute engines. Operation Center keeps those pipelines running reliably in production.
Model and standardize — Data modeling applies consistent definitions, dimensional models, and metrics across departments — eliminating data silos without changing existing architectures.
Analyze and serve — Data Analysis runs ad hoc SQL queries without data engineering skills. DataService Studio publishes results as governed data APIs with zero O&M overhead.
Govern and extend — Data Quality blocks dirty data before it propagates downstream. Data Map traces lineage across assets. Open Platform integrates DataWorks into your existing systems.
Module overview
| Module | What it does |
|---|---|
| Data Integration | Synchronize data across 50+ heterogeneous sources in offline, real-time, or integrated modes |
| Data Studio and Operation Center | Develop, orchestrate, deploy, and monitor data processing tasks across multiple compute engines |
| Data modeling | Plan data warehouse layers, define standards, build dimensional models, and manage metrics |
| Data Analysis | Run SQL queries, upload datasets, and visualize data without data engineering skills |
| Data Quality | Monitor data at the table and field levels and block problematic tasks to prevent dirty data propagation |
| Data Map | Search, categorize, and trace data lineage across your data assets |
| DataService Studio | Build, publish, and manage data APIs with serverless architecture |
| Open Platform | Integrate external systems through OpenAPI, OpenEvent, and Extensions |
| Migration Assistant | Migrate jobs from open-source scheduling engines or between DataWorks environments |
Data Integration
Data Integration syncs data across heterogeneous sources — no custom connectors or manual network configuration required. It supports full and incremental synchronization in offline, real-time, or integrated modes.
Capabilities
| Feature | Description |
|---|---|
| Batch synchronization | Configure scheduling cycles for synchronization tasks |
| 50+ data sources | Synchronize between relational databases, data warehouses, non-relational databases, file storage, and message queues |
| Network flexibility | Connect to data sources across the public internet, IDCs, or VPCs |
| Security | Monitor operations and enforce access controls during synchronization |
Engine architecture
Data Integration uses a star-shaped engine architecture: any connected data source can form synchronization links with any other supported source. For a full list, see Supported data sources and synchronization solutions.

Before synchronizing data, establish network connectivity between your data source and a resource group. Data Integration tasks run on serverless resource groups (recommended) or exclusive resource groups for Data Integration (legacy). For network solutions, see Network connectivity solutions.

Use cases
| Use case | Description |
|---|---|
| Data lake and warehouse ingestion | Ingest data from source systems into your data lake or data warehouse |
| Database sharding | Shard databases and tables across distributed storage |
| Real-time data archiving | Archive streaming data for long-term storage and analysis |
| Cross-cloud migration | Move data between cloud environments |
Data Studio and Operation Center
Data Studio is a development platform for data processing. Operation Center is an intelligent operations and maintenance (O&M) platform. Together, they give you a standardized end-to-end workflow — from writing code to keeping pipelines healthy in production.
Capabilities
| Feature | Description |
|---|---|
| Multi-engine support | Develop, test, deploy, and manage tasks across MaxCompute, E-MapReduce, CDH, Hologres, AnalyticDB, and ClickHouse from a single platform |
| Intelligent editor and visual orchestration | Write code in an intelligent editor and build task workflows with drag-and-drop dependency orchestration. The scheduling system is proven by Alibaba Group's internal workloads |
| Environment isolation | Standard mode separates development and production environments. Version control, code review, smoke testing, deployment control, and operational auditing standardize your development lifecycle |
| Operational monitoring | Operation Center provides data timeliness assurance, task diagnostics, impact analysis, automated O&M, and mobile-based O&M |
DataWorks provides workspaces in standard mode to isolate development and production environments. For more information, see Differences between workspace modes.
Development and operations workflow
Development workflow
Task monitoring, troubleshooting, and resolution
Data modeling
Data modeling builds on over a decade of Alibaba's data warehouse modeling best practices. Use it to establish consistent data definitions across departments, build dimensional models, and manage metrics — without rebuilding your existing architecture.
Four core modules
| Module | Capabilities |
|---|---|
| Data Warehouse Planning | Plan data warehouse layers, data domains, and data marts. Configure model design spaces so different departments share a common set of data standards and models |
| Data Standard | Define field standards, standard codes, units of measurement, and naming dictionaries. Automatically generate data quality rules from standard codes to simplify compliance checks |
| Dimensional Modeling | Reverse modeling addresses the cold-start problem for existing data warehouses. Import models from Excel files or build them with FML, an SQL-like domain-specific language. Visual dimensional modeling integrates with Data Studio to automatically generate ETL (extract, transform, load) code |
| Data Metrics | Define atomic metrics and derived metrics. Batch-create derived metrics based on atomic metrics and various dimensions. Integrates with dimensional modeling |
Architecture

Use cases
| Use case | Description |
|---|---|
| Structured data management | Organize and store large-scale enterprise data in a structured, consistent manner |
| Cross-department data integration | Break data silos between departments to give decision-makers a complete view of business data |
| Unified data standards | Establish consistent data definitions across systems without changing existing architectures, enabling upstream and downstream data interconnection |
| Data value realization | Deliver more effective data services using various types of enterprise data |
Data Analysis
Data Analysis gives data analysts, product managers, and operations staff the tools to query and visualize data directly — no data engineering skills required.
Capabilities
Upload personal datasets and access public datasets
Search and bookmark tables
Run online SQL queries
Share SQL files and download query results
Visualize data on large screens using spreadsheets
Use cases
| Use case | Description |
|---|---|
| Scalable analysis | Use compute engine resources to analyze full-scale datasets |
| Cross-system data flow | Analyze data from databases across different business systems. Export data to MaxCompute tables or share result sets with specified users and grant them permissions |
| Secure operations | Integrate SQL queries and result downloads with security auditing |
Data Quality
Data Quality monitors data at the table and field levels using over 30 preset monitoring templates and custom templates. It detects source data changes and dirty data during ETL processing, then automatically blocks problematic tasks — preventing dirty data from propagating downstream.
Monitoring and verification
Data Quality monitors datasets across various engines, including MaxCompute. When offline data changes, Data Quality verifies the data and blocks the production pipeline to prevent data pollution. Historical verification results are stored for quality analysis and classification. For more information, see Data Quality.
Data Quality addresses the following issues:
| Issue type | Description |
|---|---|
| Database changes | Frequent schema or structural changes in source databases |
| Business changes | Evolving business logic that causes data inconsistencies |
| Data definition issues | Mismatched or undefined field standards |
| Dirty data from business systems | Invalid or malformed data originating in upstream systems |
| System interaction issues | Quality degradation caused by cross-system dependencies |
| Data correction issues | Errors introduced during manual data corrections |
| Data warehouse quality issues | Quality problems originating within the warehouse itself |
Data Map
Data Map is built on data search capabilities. It provides table usage instructions, data categories, data lineage, and field lineage — giving data consumers and data owners a shared space to manage assets and collaborate on development.

DataService Studio
DataService Studio is a flexible, lightweight, secure, and stable platform for building and publishing data APIs. It provides publication approval, access control, usage metering, and resource isolation — with zero O&M overhead.
Unified API service bus
DataService Studio acts as a unified service bus between the data warehouse and applications, closing the gap between the data warehouse, databases, and data applications.

| Feature | Description |
|---|---|
| No-code and SQL-mode API generation | Generate data APIs from tables in various data sources using no-code or self-service SQL mode. Use Function Compute to process API request parameters and returned results |
| Single-click publishing | Publish API services to an API gateway with a single click |
Serverless architecture
DataService Studio uses a serverless architecture, so you focus on API query logic instead of managing infrastructure. Computing resources are provisioned automatically with elastic scaling, resulting in zero O&M costs.

Open Platform
Open Platform exposes DataWorks data and capabilities to external systems through OpenAPI, OpenEvent, and Extensions. Use it to manage data workflows, govern data, and respond to business status changes from your own applications.
Integration capabilities
| Capability | Description |
|---|---|
| OpenAPI | Integrate your applications with DataWorks. Batch-create, publish, and manage tasks to improve processing efficiency and reduce manual operations. For more information, see OpenAPI |
| OpenEvent | Subscribe to system events for real-time notifications. For example, subscribe to table change events to monitor core tables, or subscribe to task change events to build a real-time task monitoring dashboard. For more information, see OpenEvent |
| Extensions | Service-level plug-ins that combine OpenAPI and OpenEvent to customize workflow controls in DataWorks. For example, create a deployment control plug-in to block tasks that do not comply with your standards. For more information, see Extensions |
Use cases
Open Platform supports deep system integration, automated operations, workflow definition, and business monitoring. Build industry-specific and scenario-based data applications and plug-ins on the DataWorks Open Platform.
Migration Assistant
Migration Assistant migrates jobs from open-source scheduling engines to DataWorks and supports cross-cloud, cross-region, and cross-account job migration — so you can clone and deploy DataWorks jobs without rebuilding from scratch. The DataWorks team, in collaboration with big data expert service teams, also offers cloud migration services to help you move your data and tasks to the cloud.
Migration capabilities
| Capability | Description |
|---|---|
| Task migration to the cloud | Migrate jobs from open-source scheduling engines to DataWorks |
| DataWorks migration | Migrate development assets within the DataWorks ecosystem |
Use cases
| Use case | Description |
|---|---|
| Task migration to the cloud | Migrate jobs from open-source scheduling engines to DataWorks |
| Task backup | Regularly back up task code to minimize losses from accidental project deletion |
| Business replication | Abstract common business logic and use the export/import feature to replicate it across projects |
| Test environment setup | Replicate business code and switch the data input from production to test data |
| Cross-cloud development | Import and export between DataWorks on the public cloud and DataWorks in a private cloud for collaborative development |