All Products
Search
Document Center

DataWorks:What is DataWorks

Last Updated:Oct 27, 2025

DataWorks is a comprehensive data platform that integrates with big data and AI services like MaxCompute, E-MapReduce, Hologres, Realtime Compute for Apache Flink, AnalyticDB, StarRocks, and PAI. It provides end-to-end data integration, development, governance, and analytics capabilities for modern data architectures including data warehouses, data lakes, and lakehouses. Built on Alibaba Group's proven big data methodologies since 2009, DataWorks helps enterprises manage the entire data lifecycle and is used by thousands of customers across finance, retail, and manufacturing to drive digital transformation.

Capabilities

image

DataWorks provides a suite of powerful features to streamline data workflows across the entire data lifecycle. Its capabilities are organized into six core modules that cover everything from data modeling and integration to security and analytics.

Core module

Key features

Data Modeling

Systematically design and manage standardized, reusable data warehouse assets.

Data Integration

Enable real-time and offline data synchronization between diverse sources in cloud or on-premises environments.

Data Studio

Develop batch, stream, and machine learning tasks in an online IDE with support for SQL, Spark, and Python. Configure complex task dependencies and schedules.

Data Asset Governance

Govern data assets with features like Data Quality, Data Map, and Data Asset Management to define quality rules, trace data lineage, and manage your data catalog.

Data Security

Ensure data compliance and security throughout the data lifecycle with features like data masking, fine-grained access control, and security auditing.

Data analytics services

Perform interactive analysis and gain business insights using tools like SQL Query and Smart Data Discovery. Generate high-performance data APIs with no code for seamless application integration.

Advantages

Comprehensive features and exceptional performance

The platform's unified capabilities cover the entire data lifecycle, from integration and development to governance and services. It is engineered for high performance, processing petabytes of data daily and scheduling millions of complex tasks to handle demanding, large-scale enterprise workloads.

Intelligent and easy to use

A fully graphical user interface and a flexible web-based IDE reduce the learning curve. With support for SQL and Python, plus a built-in intelligent Copilot for code generation and smart Q&A, both new and experienced users can become productive quickly.

Cost-effective and ready to use

Its cloud-native, fully managed architecture provides data development and governance capabilities out of the box. This model significantly reduces the R&D and O&M costs associated with self-built platforms, freeing up resources to focus on business innovation.

Secure, stable, and reliable

Financial-grade security is ensured through strict tenant isolation and a fine-grained access control system. Platform stability is proven at enterprise scale, handling peak loads during events like Alibaba Group's Global Shopping Festival.

Target users and typical customers

  • Technical personnel: Data engineers and algorithm engineers who perform core data development and modeling work.

  • Business personnel: Operations specialists and BI analysts who perform self-service data queries and analysis.

  • Management personnel: Data asset administrators and data security officers who handle data governance and compliance control.

DataWorks serves customers across various industries, including public service, finance, retail, internet, automotive, and manufacturing. Typical customers include:

  • China State Grid's big data center: Uses DataWorks to centrally manage petabytes of data from its headquarters and 27 provincial/municipal companies. Its end-to-end data platform governance and monitoring system accelerates the overall digital transformation.

  • Fortune 500 company Mondelēz China: Leverages DataWorks intelligent data modeling for end-to-end data governance. This has significantly enhanced the self-service capabilities of its data platform, enabling decentralized data-driven decision-making and digital retail transformation.

  • Listed company iDreamSky: Replaced its self-developed scheduling system with DataWorks based on the open-source EMR engine. This allows the company's technical staff to focus more on business operations, supporting data-driven decisions in gaming.

For more customer stories, see Customer cases.

Getting started

Activate the service

Important

DataWorks can only be used on a PC with Chrome 69 or later.

For most enterprise users, it is best to start by activating the DataWorks service with the Professional Edition, which covers the majority of core data development and governance features.

Before making a purchase, see Billing overview (AI translation), Purchase guide, and Editions and billing.

Learning path

Use the learning path on the DataWorks documentation homepage to quickly understand its concepts, basic operations, and advanced features.

image

Product support

You can submit a ticket for pre-sales and after-sales inquiries.

Appendix: The evolution of DataWorks

Development history within Alibaba Group

Since its inception in 2009, DataWorks has evolved alongside Alibaba's business operations. By leveraging the capabilities of big data computing engines like MaxCompute and Hologres, it has progressed through multiple technological stages, supporting the development of Alibaba's data platform and data governance initiatives. Currently, DataWorks has over 50,000 daily active users within Alibaba Group, meaning an average of one in three employees uses it. It supports over 300 data applications and serves more than 100 business units across Alibaba Group.

Phase (Year)

Theme

Business development

Platform development

Phase 1 (2012)

Diverse business growth and data value discovery

Multiple business teams developed in parallel, including 1688, AliExpress, Taobao, and eTao.

Various data platforms coexisted to support digital transformation:

  • Largest Oracle cluster in China.

  • Yunti 1 (Hadoop) reached 4,000 servers, serving multiple clusters for various BUs.

  • Tianwang (predecessor to DataWorks), a unified scheduling tool for Taobao's Hadoop.

  • Development of Yunti 2 (ODPS, now MaxCompute) began. Ant Financial's micro-loan business, "Muyangquan," was launched.

Phase 2 (2015)

Vertical business silos emerge

Vertical business development:

  • 2013: Cainiao was founded; "All-in-Wireless strategy" was launched.

  • 2014: Invested in Amap (Gaode), formed a joint venture with Intime Retail, and established Alitrip.

  • 2015: Launched DingTalk and Lingshoutong, established Koubei, and acquired a controlling stake in AliHealth.

  • 2015: Launched the "Middle Platform Strategy" to build an organizational and business mechanism of "big middle platform, small front office" to address data silo issues.

Moon Landing Plan launched to unify data platforms:

  • Yunti 1 (Hadoop) hit the open-source bottleneck at 5,000 servers.

  • Yunti 2 proved its ability to scale beyond 5,000 servers with the "5K Project".

  • DataWorks supported the group's unified data exchange platform through the "Firebird Project".

  • All of Alibaba's data was consolidated into Yunti 2 to create a unified group data platform.

Phase 3 (2018)

Data platform supports sustainable business growth

Data-driven business development:

  • Operations staff adopted fine-grained operational strategies covering the entire user lifecycle.

  • Personalized intelligent marketing was achieved.

  • A data analytics tool for merchants enabled data monetization.

  • Business operations moved towards real-time processing.

Data platform construction and assetization:

  • The data platform fully supported the construction of the data middle platform.

  • DataWorks built one-stop capabilities for large-scale data development and governance.

  • MaxCompute supported a 100,000-server cluster, serving daily operations for 100+ Alibaba Group's BUs and 200,000+ Alibaba employees.

Phase 4 (2021)

Cloud Data Platform Grows with the Business

Fully cloud-native with deep business and data integration:

  • 100% of core 11.11 Global Shopping Festival systems migrated to the cloud; Alibaba Cloud handled traffic peaks (538,000 transactions per second).

  • The data platform supported all Alibaba Group's BUs, enabling operations staff to promptly identify and analyze issues and make real-time decisions.

  • Supported the emergence of new services like short video and live streaming.

Data platform served the business, creating a positive feedback loop:

  • The DataWorks-built data platform fully served the business, supporting 300+ data applications within the Alibaba Group.

  • MaxCompute's intelligent data warehouse makes it easy to handle huge workloads.

  • The MaxCompute lakehouse architecture gradually became the next-generation data platform architecture.

  • The platform implemented end-to-end data governance, supporting 60% business growth with only a 10% increase in costs.

Development history on Alibaba Cloud

DataWorks was officially launched on Alibaba Cloud in 2015, bringing years of big data expertise to cloud customers. Through continuous iteration of its capabilities, DataWorks works with customers and partners across industries. By providing end-to-end data governance, it enables them to manage and use data effectively, helping customers improve data quality and efficiency.

Year

Milestone

Description

2009

DataWorks project initiated at Alibaba Group

Developed the in-house DataX data synchronization engine and a task scheduling engine to serve ultra-large-scale Hadoop clusters.

2013

"Moon Landing Plan" initiated; tech stack transformation

Codename: Moon Landing Plan. The group launched a platform unification plan, fully migrating from Hadoop to MaxCompute. DataWorks was adapted to fully serve MaxCompute.

2015

DataWorks officially launched on the cloud

Codename: Shujia Platform. As a core product of the "Shujia Platform", DataWorks entered the public cloud market and began serving enterprise and government customers.

2017

International expansion

Completed deployment in 12+ Alibaba Cloud regions worldwide and began serving global customers.

2018

DataWorks V2.0 released

Evolved into a one-stop intelligent big data cloud R&D platform, covering data integration, data development, data services, and application development.

2019

DataWorks V3.0 released

Supported hybrid orchestration of tasks from multiple computing engines and introduced a new comprehensive data governance system.

2020

Full openness and ecosystem building

Launched a new open platform to build a partner ecosystem. Recognized as a Strong Performer in the Forrester Wave™ for Cloud Data Warehouses.

2022

Data governance capabilities upgraded

Launched the "Data Modeling and Governance Center" product. Achieved the number one market share in China's data governance market (IDC).

2024

Embraced AIGC, launched new Data+AI capabilities

Core release: Copilot. Upgraded data development and analysis capabilities, released the Copilot product, and provided end-to-end Data+AI development and governance based on the OpenLake lakehouse architecture.

More information