All Products
Search
Document Center

DataWorks:Features

Last Updated:Feb 28, 2026

DataWorks is an all-in-one big data development and governance platform that supports end-to-end data processing. Use DataWorks to integrate, develop, model, analyze, monitor, serve, and govern data across your organization, and build an enterprise-level data middle platform.

Module overview

Module

Description

Data Integration

Synchronize data across 50+ heterogeneous sources in offline, real-time, or integrated modes

Data Studio and Operation Center

Develop, orchestrate, deploy, and monitor data processing tasks across multiple compute engines

Data modeling

Plan data warehouse layers, define standards, build dimensional models, and manage metrics

Data Analysis

Run SQL queries, upload datasets, and visualize data without data engineering skills

Data Quality

Monitor data at the table and field levels and block problematic tasks to prevent dirty data propagation

Data Map

Search, categorize, and trace data lineage across your data assets

DataService Studio

Build, publish, and manage data APIs with serverless architecture

Open Platform

Integrate external systems through OpenAPI, OpenEvent, and Extensions

Migration Assistant

Migrate jobs from open-source scheduling engines or between DataWorks environments

Data Integration

Data Integration is a stable, efficient, and elastic data synchronization platform that connects heterogeneous data sources across network environments.

Synchronization modes and capabilities

Data Integration supports full and incremental data synchronization in offline, real-time, or integrated modes.

  • Batch synchronization: Configure scheduling cycles for synchronization tasks.

  • 50+ data sources: Synchronize data between relational databases, data warehouses, non-relational databases, file storage, and message queues.

  • Network flexibility: Connect to data sources across public internet, IDCs, or VPCs.

  • Security: Monitor operations and enforce access controls during synchronization.

Engine architecture

Data Integration uses a star-shaped engine architecture. Any connected data source can form synchronization links with any other supported source. For a list of supported data sources, see Supported data sources and synchronization solutions.

Star-shaped engine architecture of Data Integration showing interconnected data sources

Before synchronizing data, establish network connectivity between your data source and a resource group. Data Integration tasks run on serverless resource groups (recommended) or exclusive resource groups for Data Integration (legacy). For network solutions, see Network connectivity solutions.

Resource groups and network connectivity between data sources and Data Integration

Typical use cases

  • Ingesting data into data lakes and data warehouses

  • Sharding databases and tables

  • Archiving real-time data

  • Moving data between clouds

Data Studio and Operation Center

Data Studio is a development platform for data processing. Operation Center is an intelligent operations and maintenance (O&M) platform. Together, they provide a standardized way to build and manage data development workflows.

Multi-engine development and environment isolation

  • Multi-engine support: Develop, test, deploy, and manage tasks across MaxCompute, E-MapReduce, CDH, Hologres, AnalyticDB, and ClickHouse from a unified platform.

  • Intelligent editor and visual orchestration: An intelligent editor and drag-and-drop dependency orchestration for building task workflows. The scheduling system is proven by Alibaba Group's internal workloads.

  • Environment isolation: Separate development and production environments in standard mode. Version control, code review, smoke testing, deployment control, and operational auditing standardize your development lifecycle.

  • Operational monitoring: Operation Center provides data timeliness assurance, task diagnostics, impact analysis, automated O&M, and mobile-based O&M.

DataWorks provides workspaces in standard mode to isolate development and production environments. For more information, see Differences between workspace modes.

Development and operations workflow

  • Development workflow

    Development workflow from code editing through testing to deployment

  • Task monitoring, troubleshooting, and resolution

    Task monitoring, troubleshooting, and resolution workflow in Operation Center

Data modeling

Data modeling in DataWorks incorporates over a decade of best practices from Alibaba's data warehouse modeling methodologies. Build enterprise data assets through structured modeling and reverse modeling for data marts and data middle platforms.

Four core modules

Data modeling includes four modules: Data Warehouse Planning, Data Standard, Dimensional Modeling, and Data Metrics.

Module

Capabilities

Data Warehouse Planning

Plan data warehouse layers, data domains, and data marts. Configure model design spaces so that different departments share a common set of data standards and models.

Data Standard

Define field standards, standard codes, units of measurement, and naming dictionaries. Automatically generate data quality rules from standard codes to simplify compliance checks.

Dimensional Modeling

Reverse modeling addresses the cold-start problem for existing data warehouses. Import models from Excel files or build them with FML, an SQL-like domain-specific language. Visual dimensional modeling integrates with Data Studio to automatically generate ETL code.

Data Metrics

Define atomic metrics and derived metrics. Batch-create derived metrics based on atomic metrics and various dimensions. Integrates with dimensional modeling.

Architecture

Data modeling architecture showing relationships between Data Warehouse Planning, Data Standard, Dimensional Modeling, and Data Metrics

Typical use cases

  • Structured data management: Organize and store large-scale enterprise data in a structured and consistent manner.

  • Cross-department data integration: Break data silos between departments and business domains to give decision-makers a complete view of business data.

  • Unified data standards: Establish consistent data definitions across systems without changing existing architectures. Enable upstream and downstream data interconnection.

  • Data value realization: Use various types of enterprise data to deliver more effective data services.

Data Analysis

Data Analysis provides tools for data analysts, product managers, and operations staff to retrieve and analyze data without requiring data engineering skills -- making everyone a data analyst.

Core capabilities

  • Upload personal datasets and access public datasets

  • Search and bookmark tables

  • Run online SQL queries

  • Share SQL files and download query results

  • Visualize data on large screens using spreadsheets

Typical use cases

Use case

Description

Scalable analysis

Leverage compute engine resources to analyze full-scale datasets.

Cross-system data flow

Analyze data from databases across different business systems. Export data to MaxCompute tables or share result sets with specified users and grant them permissions.

Secure operations

Integrate SQL queries and result downloads with security auditing.

Data Quality

Data Quality monitors data at the table and field levels using over 30 preset monitoring templates and custom templates. It detects source data changes, identifies dirty data during ETL (extract, transform, load) processing, and automatically blocks problematic tasks to prevent dirty data from propagating downstream.

Monitoring and verification

Data Quality monitors datasets across various engines, including MaxCompute. When offline data changes, Data Quality verifies the data and blocks the production pipeline to prevent data pollution. It stores historical verification results for quality analysis and classification. For more information, see Data Quality.

Data Quality addresses the following issues:

  • Frequent database changes

  • Frequent business changes

  • Data definition issues

  • Dirty data from business systems

  • Quality issues caused by system interactions

  • Issues caused by data correction

  • Quality issues originating from the data warehouse

Data Map

Data Map is built on data search capabilities. It provides tools for table usage instructions, data categories, data lineage, and field lineage. Data consumers and data owners use Data Map to manage data and collaborate on development.

Data Map interface showing data search, categorization, and lineage features

DataService Studio

DataService Studio is a flexible, lightweight, secure, and stable platform for building and publishing data APIs. It provides publication approval, access control, usage metering, and resource isolation.

Unified API service bus

DataService Studio acts as a unified service bus between the data warehouse and applications. It unifies the creation and management of API services, closing the gap between the data warehouse, databases, and data applications.

DataService Studio architecture showing the bridge between data warehouse and applications

  • Generate data APIs from tables in various data sources using no-code or self-service SQL mode. Use Function Compute to process API request parameters and returned results.

  • Publish API services to an API gateway with a single click.

Serverless architecture

DataService Studio uses a serverless architecture. Focus on API query logic instead of managing infrastructure. DataService Studio automatically provisions computing resources with elastic scaling, resulting in zero O&M costs.

DataService Studio serverless architecture with elastic scaling

Open Platform

Open Platform exposes DataWorks data and capabilities to external systems through OpenAPI, OpenEvent, and Extensions. Integrate applications with DataWorks to manage data workflows, govern data, and respond to business status changes.

Three integration capabilities

  • OpenAPI: Integrate your applications with DataWorks. Batch create, publish, and manage tasks to improve processing efficiency and reduce manual operations. For more information, see OpenAPI.

  • OpenEvent: Subscribe to system events for real-time notifications. For example, subscribe to table change events to monitor core tables, or subscribe to task change events to build a real-time task monitoring dashboard. For more information, see OpenEvent.

  • Extensions: Service-level plug-ins that combine OpenAPI and OpenEvent. Customize workflow controls in DataWorks. For example, create a deployment control plug-in to block tasks that do not comply with your standards. For more information, see Extensions.

Typical use cases

Open Platform supports deep system integration, automated operations, workflow definition, and business monitoring. Build industry-specific and scenario-based data applications and plug-ins on the DataWorks Open Platform.

Migration Assistant

Migration Assistant migrates jobs from open-source scheduling engines to DataWorks. It supports cross-cloud, cross-region, and cross-account job migration, allowing you to quickly clone and deploy DataWorks jobs. The DataWorks team, in collaboration with big data expert service teams, also offers cloud migration services to help you move your data and tasks to the cloud.

Migration capabilities

Capability

Description

Task migration to the cloud

Migrate jobs from open-source scheduling engines to DataWorks.

DataWorks migration

Migrate development assets within the DataWorks ecosystem.

Typical use cases

Use case

Description

Task migration to the cloud

Migrate jobs from open-source scheduling engines to DataWorks.

Task backup

Regularly back up task code to minimize losses from accidental project deletion.

Business replication

Abstract common business logic and use the export/import feature to replicate it across projects.

Test environment setup

Replicate business code and change the data input from production to test data.

Cross-cloud development

Import and export between DataWorks on the public cloud and DataWorks in a private cloud for collaborative development.