All Products
Search
Document Center

DataWorks:Use cases

Last Updated:Oct 23, 2025

This topic describes how to use DataWorks to build an enterprise-grade cloud data warehouse that integrates offline and real-time data processing. This solution helps enterprises reduce data latency and accelerate business decisions.

Build a cloud data warehouse with integrated offline and real-time capabilities

Business challenges

Modern businesses need faster access to data than ever before. Traditional data architectures struggle to meet this demand:

  • Fragmented architecture and high complexity: Businesses often operate two separate technology stacks. One stack batch processes historical data in an offline data warehouse using engines like Hive or Spark. Another stack processes real-time data streams using engines like Flink or Kafka. This dual-stack approach increases development and maintenance costs while making it harder to ensure data consistency.

  • Analytics latency and delayed decisions: Offline warehouse data isn't immediately available for ad hoc queries or interactive analysis. Business users often wait hours—or even a full day—before they can explore new data. Additionally, correlating real-time data with massive historical datasets is difficult, limiting the depth of insights.

  • Poor resource elasticity and high costs: Peak batch processing workloads and spikes in real-time computing traffic require significant reserved resources. This results in low resource utilization and high total cost of ownership (TCO).

  • High technical barriers: Managing two complex, separate systems requires a large, highly skilled big data team—a significant challenge for most enterprises.

Solution

DataWorks, combined with cloud-native big data engines like MaxCompute and Hologres, provides an all-in-one data platform. Built on a Data Lakehouse architecture with integrated stream and batch processing, this platform helps enterprises reduce data latency.

image
  1. Unified data ingestion and layering

    Use Data Integration to ingest data from various sources into a unified cloud data lake or data warehouse. Sources include structured data from business databases, log files, and real-time message queues such as Kafka. The data follows a standard layering model (ODS → DWD → DWS → ADS). This model allows a single copy of the data to serve both offline and real-time computing, ensuring consistency from the source.

  2. Batch data processing

    In Data Studio, use MaxCompute SQL nodes to efficiently and cost-effectively process, clean, and model terabytes or petabytes of historical data. The scheduling system automatically runs these ETL tasks daily after midnight. This builds a comprehensive data foundation for decision analysis, user profiling, and machine learning.

  3. Real-time and near-real-time computing

    • Real-time computing: Use Flink SQL nodes in DataWorks to process and analyze data streams with millisecond latency. This is ideal for scenarios requiring sub-second latency, such as real-time risk control, real-time dashboards, and real-time recommendations.

    • Near-real-time analysis (ad hoc queries): Hologres lets you run interactive queries with second-level latency on massive offline data in your data lake or data warehouse. Business analysts and operations staff can perform multi-dimensional drill-downs and exploration directly on the latest data using BI tools, without waiting for scheduled reports.

  4. Integrated analytics and unified services

    DataWorks allows Hologres to directly accelerate queries on MaxCompute data. This enables seamless, federated analysis of real-time and historical offline data, breaking down data silos. Use DataWorks DataService Studio to package analysis results into standard APIs. This provides a unified, high-performance data service endpoint for upstream business applications, BI reports, and dashboards.

Core values

  • Simplified architecture and reduced TCO: The Data Lakehouse architecture unifies the technology stack with a single storage layer, a single development platform, and multiple compute engines. This reduces development, management, and operational complexity, lowering TCO by over 50%.

  • Accelerated time to insight: Reduces the data analysis cycle from days to minutes or seconds. This shift from periodic reviews to real-time insights enables faster, more agile decisions.

  • Self-service analytics: High-performance interactive queries enable business users to perform self-service data exploration. This frees data analysts from manually fulfilling ad hoc data requests.

  • Accelerated innovation: A unified, real-time, and high-performance data foundation provides a powerful platform for data-driven innovation, such as user behavior analysis, precision marketing, financial risk control, and intelligent supply chains.

Customer case study

Financial services: A data lakehouse implementation at an Internet finance company