DataWorks helps enterprises across industries eliminate data silos, reduce operational costs, and accelerate data-driven decision-making. The following cases show how organizations in retail, finance, energy, logistics, media, and gaming solved real data challenges using DataWorks.
New retail: Cloud data mid-end for RT-Mart
400+ TB migrated in 15 days · Data mid-end built on Alibaba Cloud
RT-Mart planned a full IT migration to Alibaba Cloud within two years, replacing self-managed data centers with a cloud-native data mid-end. The goal was to reduce total cost of ownership (TCO) and establish closed-loop control over data assets across the cloud ecosystem.
Challenge: The original system ran on open source Hadoop with poor stability and high hardware and software maintenance costs. Surging online business created a growing backlog of unmet requirements, and RT-Mart needed a solution that could scale with business growth.
Outcome: MaxCompute Migration Assist (MMA) migrated more than 400 TB of historical data to the cloud in 15 days with high accuracy, providing RT-Mart with a smooth and efficient data migration experience. DataWorks and MaxCompute together improved data development efficiency and powered RT-Mart's new cloud data mid-end.
New finance: Data lakehouse for an internet financial company
Unified metadata and permissions · Seamless data flow between data lake and data warehouse
The company ran a dual-engine architecture: Hadoop and Object Storage Service (OSS) for the data lake, and MaxCompute for the data mid-end. The two heterogeneous engines caused storage redundancy, metadata and permission inconsistencies, and calculation interruptions.
Challenge: The company needed to support different business scenarios with different compute engines while managing metadata and user permissions from a single control plane.
Outcome: The solution integrates E-MapReduce (EMR) engine metadata into Alibaba Cloud Data Lake Formation (DLF) and uses OSS as the unified storage layer. This builds a data lakehouse that connects the EMR-based data lake to the MaxCompute-based data warehouse. Data flows freely between the two, and calculations run without interruption. Intermediate tables for dimensional modeling are stored in MaxCompute; data consumed by EMR and other engines is stored at the application data service (ADS) layer. DataWorks provides end-to-end data governance across the entire architecture to improve data quality and enhance data application.
New energy: End-to-end data governance for an energy company
CNY 100M in cost savings · Data refresh from 1 day to 10 minutes · Service release from 1 week to 1 day
The company operated multiple subsidiaries with a large number of systems built on varied and complex technical stacks. Data was scattered across silos, defined under inconsistent standards, and lacked effective mechanisms for permission management, governance, and sharing.
Challenge: Data gaps reduced analytical accuracy, and the absence of centralized governance prevented data from flowing freely across business units.
Outcome: DataWorks and MaxCompute eliminated data silos with a unified data mid-end. Realtime Compute for Apache Flink and Hologres added real-time processing capabilities alongside offline batch pipelines. End-to-end data governance through DataWorks improved data quality, accuracy, and consistency across the organization. Results:
-
Cost reduction of approximately CNY 100 million
-
Data refresh cycle shortened from 1 day to every 10 minutes
-
New service release time reduced from 1 week to 1 day
-
An intelligent business-to-business (B2B) marketing system integrating smart manufacturing with internet marketing
Internet: Cloud big data warehouse for GOGOX
30%+ server cost reduction · 100% improvement in data development efficiency · Zero cluster failures
GOGOX is a logistics platform that uses digital methods — network connection, transport resource sharing, process digitalization, and intelligent matching — to precisely distribute idle transport resources to required markets. This saves energy, reduces emissions, lowers the empty load rate, and facilitates the development of green logistics.
Challenge: Massive data processing was slow, and the duration of offline data calculation was unpredictable. Maintaining Realtime Compute for Apache Flink required significant development effort. The company needed comprehensive data warehouse governance.
Outcome: The Apsara big data platform reduced server costs by more than 30% and doubled data development efficiency. Flink SQL replaced the original Java-based Apache Storm, cutting the development cycle for real-time computing and simplifying maintenance. Flink SQL also improved data consistency, service monitoring accuracy, and real-time performance. Alibaba Cloud's 24-hour O&M service ensures cluster stability with zero failures.
Internet: Cloud big data warehouse for Babytree
10x task performance improvement · Storage reduced from 3 PB to 900 TB · 30%+ cost reduction
Babytree, founded in 2007, is the largest and most active community platform for maternity and infant care in China. Starting with its own data centers from the early days, the company's infrastructure scaled rapidly but became difficult to manage.
Challenge: Self-managed data centers delivered poor performance and high annual operating costs. The company needed a comprehensive big data governance solution that reduced costs while improving efficiency.
Outcome: After migrating to MaxCompute, Realtime Compute for Apache Flink, and DataWorks:
-
Specific task performance improved by more than 10 times
-
Data storage reduced from 3 PB in the self-managed Hadoop system to 900 TB in the cloud
-
Overall costs reduced by more than 30%
Realtime Compute for Apache Flink enabled real-time processing for existing Babytree scenarios, including real-time recommendations based on user IDs and content types, real-time group chat IDs of users, and real-time article publishing signals — increasing the behavior conversion rate across the platform.
Game: Full-link game operations for DeNA China
First in gaming to use Lightning Cube + MMA · 300 TB incremental + 50 TB historical migrated in one month without leased lines
DeNA is a game service provider operating in an industry where project lifecycles are getting shorter. A cost-effective, efficient, and data-driven operations system is essential to manage each project stage with precision.
Challenge: Two separate clusters running Hadoop 1.0 and 2.0 created architectural complexity, reducing stability, security, and scaling performance. Diversified log sources and growing log volumes degraded the performance and stability of the Fluentd-based log collection service. Data development relied on manual coding, and Hive-based computing could not meet throughput requirements.
Outcome: DeNA China became the first company in the gaming industry to use Lightning Cube together with MMA. About 300 TB of incremental data and 50 TB of historical data accumulated in RDS databases over 10 years were migrated to the cloud in just over a month — without leased lines. Compared to the original Python-based Airflow workflow:
-
Task management is clearly visualized; errors are detected and resolved immediately
-
Hundreds of data sources are managed in one place without redundant effort
-
Resource scheduling is done through a GUI, eliminating manual coding
The Apsara big data platform now manages the full operational link — from data collection, storage, and computing to real-time and offline analysis.