DataWorks is a comprehensive data platform that integrates with big data and AI services like MaxCompute, E-MapReduce, Hologres, Realtime Compute for Apache Flink, AnalyticDB, StarRocks, and PAI. It provides end-to-end data integration, development, governance, and analytics capabilities for modern data architectures including data warehouses, data lakes, and lakehouses. Built on Alibaba Group's proven big data methodologies since 2009, DataWorks helps enterprises manage the entire data lifecycle and is used by thousands of customers across finance, retail, and manufacturing to drive digital transformation.
Capabilities
DataWorks provides a suite of powerful features to streamline data workflows across the entire data lifecycle. Its capabilities are organized into six core modules that cover everything from data modeling and integration to security and analytics.
Core module | Key features |
Systematically design and manage standardized, reusable data warehouse assets. | |
Enable real-time and offline data synchronization between diverse sources in cloud or on-premises environments. | |
Develop batch, stream, and machine learning tasks in an online IDE with support for SQL, Spark, and Python. Configure complex task dependencies and schedules. | |
Govern data assets with features like Data Quality, Data Map, and Data Asset Management to define quality rules, trace data lineage, and manage your data catalog. | |
Ensure data compliance and security throughout the data lifecycle with features like data masking, fine-grained access control, and security auditing. | |
Data analytics services | Perform interactive analysis and gain business insights using tools like SQL Query and Smart Data Discovery. Generate high-performance data APIs with no code for seamless application integration. |
Advantages
Comprehensive features and exceptional performance
The platform's unified capabilities cover the entire data lifecycle, from integration and development to governance and services. It is engineered for high performance, processing petabytes of data daily and scheduling millions of complex tasks to handle demanding, large-scale enterprise workloads.
Intelligent and easy to use
A fully graphical user interface and a flexible web-based IDE reduce the learning curve. With support for SQL and Python, plus a built-in intelligent Copilot for code generation and smart Q&A, both new and experienced users can become productive quickly.
Cost-effective and ready to use
Its cloud-native, fully managed architecture provides data development and governance capabilities out of the box. This model significantly reduces the R&D and O&M costs associated with self-built platforms, freeing up resources to focus on business innovation.
Secure, stable, and reliable
Financial-grade security is ensured through strict tenant isolation and a fine-grained access control system. Platform stability is proven at enterprise scale, handling peak loads during events like Alibaba Group's Global Shopping Festival.
Target users and typical customers
Technical personnel: Data engineers and algorithm engineers who perform core data development and modeling work.
Business personnel: Operations specialists and BI analysts who perform self-service data queries and analysis.
Management personnel: Data asset administrators and data security officers who handle data governance and compliance control.
DataWorks serves customers across various industries, including public service, finance, retail, internet, automotive, and manufacturing. Typical customers include:
China State Grid's big data center: Uses DataWorks to centrally manage petabytes of data from its headquarters and 27 provincial/municipal companies. Its end-to-end data platform governance and monitoring system accelerates the overall digital transformation.
Fortune 500 company Mondelēz China: Leverages DataWorks intelligent data modeling for end-to-end data governance. This has significantly enhanced the self-service capabilities of its data platform, enabling decentralized data-driven decision-making and digital retail transformation.
Listed company iDreamSky: Replaced its self-developed scheduling system with DataWorks based on the open-source EMR engine. This allows the company's technical staff to focus more on business operations, supporting data-driven decisions in gaming.
For more customer stories, see Customer cases.
Getting started
Activate the service
DataWorks can only be used on a PC with Chrome 69 or later.
For most enterprise users, it is best to start by activating the DataWorks service with the Professional Edition, which covers the majority of core data development and governance features.
Before making a purchase, see Billing overview (AI translation), Purchase guide, and Editions and billing.
Learning path
Use the learning path on the DataWorks documentation homepage to quickly understand its concepts, basic operations, and advanced features.

Product support
You can submit a ticket for pre-sales and after-sales inquiries.
Appendix: The evolution of DataWorks
Development history within Alibaba Group
Since its inception in 2009, DataWorks has evolved alongside Alibaba's business operations. By leveraging the capabilities of big data computing engines like MaxCompute and Hologres, it has progressed through multiple technological stages, supporting the development of Alibaba's data platform and data governance initiatives. Currently, DataWorks has over 50,000 daily active users within Alibaba Group, meaning an average of one in three employees uses it. It supports over 300 data applications and serves more than 100 business units across Alibaba Group.
Phase (Year) | Theme | Business development | Platform development |
Phase 1 (2012) | Diverse business growth and data value discovery | Multiple business teams developed in parallel, including 1688, AliExpress, Taobao, and eTao. | Various data platforms coexisted to support digital transformation:
|
Phase 2 (2015) | Vertical business silos emerge | Vertical business development:
| Moon Landing Plan launched to unify data platforms:
|
Phase 3 (2018) | Data platform supports sustainable business growth | Data-driven business development:
| Data platform construction and assetization:
|
Phase 4 (2021) | Cloud Data Platform Grows with the Business | Fully cloud-native with deep business and data integration:
| Data platform served the business, creating a positive feedback loop:
|
Development history on Alibaba Cloud
DataWorks was officially launched on Alibaba Cloud in 2015, bringing years of big data expertise to cloud customers. Through continuous iteration of its capabilities, DataWorks works with customers and partners across industries. By providing end-to-end data governance, it enables them to manage and use data effectively, helping customers improve data quality and efficiency.
Year | Milestone | Description |
2009 | DataWorks project initiated at Alibaba Group | Developed the in-house DataX data synchronization engine and a task scheduling engine to serve ultra-large-scale Hadoop clusters. |
2013 | "Moon Landing Plan" initiated; tech stack transformation | Codename: Moon Landing Plan. The group launched a platform unification plan, fully migrating from Hadoop to MaxCompute. DataWorks was adapted to fully serve MaxCompute. |
2015 | DataWorks officially launched on the cloud | Codename: Shujia Platform. As a core product of the "Shujia Platform", DataWorks entered the public cloud market and began serving enterprise and government customers. |
2017 | International expansion | Completed deployment in 12+ Alibaba Cloud regions worldwide and began serving global customers. |
2018 | DataWorks V2.0 released | Evolved into a one-stop intelligent big data cloud R&D platform, covering data integration, data development, data services, and application development. |
2019 | DataWorks V3.0 released | Supported hybrid orchestration of tasks from multiple computing engines and introduced a new comprehensive data governance system. |
2020 | Full openness and ecosystem building | Launched a new open platform to build a partner ecosystem. Recognized as a Strong Performer in the Forrester Wave™ for Cloud Data Warehouses. |
2022 | Data governance capabilities upgraded | Launched the "Data Modeling and Governance Center" product. Achieved the number one market share in China's data governance market (IDC). |
2024 | Embraced AIGC, launched new Data+AI capabilities | Core release: Copilot. Upgraded data development and analysis capabilities, released the Copilot product, and provided end-to-end Data+AI development and governance based on the OpenLake lakehouse architecture. |
More information
DataWorks concepts and product ecosystem: Terms and Product ecosystem.