×
Community Blog MiniMax Builds a Cloud-Native Data + AI Platform with Alibaba Cloud: A Case Study in Scaling Data Infrastructure for the LLM Era

MiniMax Builds a Cloud-Native Data + AI Platform with Alibaba Cloud: A Case Study in Scaling Data Infrastructure for the LLM Era

Discover how MiniMax leveraged Alibaba Cloud to build a scalable, cloud-native Data + AI platform powering multimodal LLMs and global user growth.

MiniMax is a global AI foundation model company. Founded in early 2022, MiniMax is committed to advancing the frontiers of AI towards AGI via its mission Intelligence with Everyone.

MiniMax's proprietary multimodal models, led by MiniMax M2.1, Hailuo2.3, Speech 2.6 and Music 2.0, have advanced coding capability and high agentic performance, as well as ultra-long context processing capability, and can understand, generate, and integrate a wide range of modalities, including text, audio, images, video, and music. These models power MiniMax's major AI-native products — including MiniMax Agent, Hailuo AI, MiniMax Audio, Talkie, and enterprise and developer-facing Open API Platform — which collectively deliver intelligent, dynamic experiences to enhance productivity and quality of life for users worldwide.

To date, MiniMax's proprietary models and AI-native products have cumulatively served over 212 million individual users across over 200 countries and regions, and more than 130,000 enterprises and developers across over 100 countries and regions.

Big Data in Practice: Data-Driven Efficient Business Iteration at MiniMax

Business Challenges

Starting in 2024, MiniMax’s products—including Hailuo AI, MiniMax Audio, and Talkie—experienced rapid growth both domestically and internationally. This surge led to an explosive increase in user data volume, quickly scaling to tens of petabytes (PBs), which posed significant technical challenges for building a robust data platform:

Efficiency Bottlenecks Caused by Heterogeneous Architectures

  • Fragmented Technology Stack: Initially, MiniMax adopted different cloud providers in China and overseas, deploying separate data platforms. This required development teams to maintain multiple sets of development standards across disparate engines.
  • Low Development Efficiency: The company had built its own big data governance tools based on a customized version of the open-source Dolphin Scheduler. However, this approach incurred high costs for feature iteration and maintenance, failing to keep pace with rapidly evolving business needs.
  • High Operational Overhead: Running dual systems in parallel meant that operational tasks—such as permission management, resource monitoring, and troubleshooting—had to be performed across platforms, causing management costs to rise linearly.

Imbalance Between Resource Costs and Utilization

  • High Total Cost of Ownership (TCO): Pay-as-you-go pricing models from certain cloud services made it difficult to predict or control costs related to data scanning, data transfer, and compute node runtime.
  • Low Resource Utilization: Some cloud services lacked maturity, offering limited optimization capabilities in complex scenarios like large-scale real-time data warehousing.
  • Resource Optimization Bottlenecks: Task tuning for open-source big data components heavily relied on manual expertise, consuming substantial human resources.

Alibaba Cloud’s Cloud-Native Data Warehouse Solution

Alibaba Cloud helped MiniMax build a globally unified, cloud-native data warehouse architecture. Centered around Alibaba Cloud’s DataWorks—a one-stop data development and governance platform—this solution enables seamless integration of heterogeneous data sources, unified stream-batch processing, real-time/offline data collaboration, and end-to-end data lifecycle management.

Data Source Layer

Aggregates diverse, heterogeneous storage systems, covering OLTP databases, unstructured data, and real-time streaming data.

Compute Layer

  • Data Governance: DataWorks’ real-time data integration offers one-stop synchronization from heterogeneous sources into the data warehouse, featuring comprehensive metadata management, quality monitoring, and access control.
  • Real-Time Computing: Leverages Alibaba Cloud’s Realtime Compute for Apache Flink to process Kafka streaming data, supporting low-latency real-time processing.
  • Real-Time Data Warehouse: Hologreshandles massive-scale real-time data ingestion, updates, and analytics, delivering sub-second response times.
  • Offline Data Warehouse: MaxCompute performs batch data processing and supports complex offline analytical workloads.
  • Data Search: Elasticsearch stores near-real-time data processed by Fling, fulfilling full-text search and ad-hoc query requirements.

Storage Layer

Object Storage Service (OSS) serves as the cold data tier, seamlessly integrated with MaxCompute to enable intelligent hot/cold data tiering, optimizing the balance between cost and performance.

Business Value Delivered

Leveraging Alibaba Cloud’s cloud-native data warehouse solution, MiniMax established a unified global data warehouse technology stack. Powered by high performance, low latency, and serverless elasticity, this infrastructure provides efficient and stable support for critical business scenarios such as operational analytics and user growth.

Accelerated Data Ingestion, Faster Decision-Making

Through DataWorks’ visual ETL capabilities, MiniMax achieved real-time full and incremental data synchronization from source systems directly into Hologres. By utilizing cross-engine data federation between MaxCompute and Hologres, the company decoupled real-time storage from offline computation. As a result, key data now lands in the warehouse approximately one hour earlier, significantly improving the timeliness of business decisions.

Unified Architecture, Improved Operational Efficiency

A globally consistent tech stack—built on Alibaba Cloud’s serverless, storage-compute decoupled architecture—dramatically reduced operational complexity and enhanced team delivery velocity.

Stable Support for Large-Scale Data Processing

The integrated big data platform—comprising DataWorks, MaxCompute, and Hologres—enables unified management across development, scheduling, operations, and governance. It currently handles over tens of PBs of total data, with daily processing volumes reaching hundreds of TBs.

Optimized Resource Utilization, Significant Cost Reduction

Through techniques like storage-compute separation and operator-level optimizations, MiniMax reduced compute resource consumption by 50%. Further refinements brought overall compute usage down by 75%. Additionally, implementing data lifecycle management policies lowered storage costs by 40%, achieving an optimal balance between performance and cost.

Building Cloud-Native Data Pipelines with Alibaba Cloud MaxFrame: Accelerating AI Workflows

In the era of rapid advancement in large language models (LLMs), the deep integration of data and artificial intelligence has become essential for enterprises seeking competitive advantage. LLM training continuously drives innovation in large-scale data processing technologies, demanding greater elasticity, higher-performance preprocessing operators, and unified data governance frameworks.

Building on MiniMax’s extensive experience with Alibaba Cloud’s cloud-native data warehouse solution, both parties are jointly exploring next-generation solutions that further fuse large-scale data processing with AI. By leveraging Alibaba Cloud’s MaxFrame—a next-generation distributed computing framework—they aim to enhance data processing efficiency and accelerate the practical deployment of AI innovations.

Business Challenges

Limited Resource Elasticity

Model training cycles are fast-paced, often requiring temporary access to massive elastic resources for short-duration, high-efficiency preprocessing of PB-scale datasets, followed by immediate resource release. Traditional architectures struggle to simultaneously meet demands for elasticity, processing speed, and cost control.

Insufficient Preprocessing Operator Performance

Common issues during data preprocessing—such as file size limits, out-of-memory (OOM) errors, and failed full-dataset MinHash deduplication tasks—led to low job success rates and poor stability, severely impacting overall pipeline efficiency.

Lack of Unified Task Management and Visualization

The original workflow relied on Python scripts for development, debugging, and production execution, lacking visual tools for task development, management, scheduling, and operations. This made it difficult to evaluate the impact of multi-parameter iterations and hampered developer productivity.

Constrained Engineering Resources for Development and Operations

Custom data preprocessing pipelines (e.g., for Common Crawl datasets) demanded significant engineering effort for development and maintenance, diverting talent away from core AI innovation.

Solution

MiniMax built a fully managed, one-stop Data + AI data processing platform on Alibaba Cloud’s MaxCompute, powered by the MaxFrame distributed computing framework. This solution delivers unified management and elastic, large-scale preprocessing capabilities for diverse data types—including structured, unstructured, and multimodal data.

Key features include:

  • A proprietary distributed computing framework from Alibaba Cloud that unifies the Python development ecosystem while seamlessly integrating with MaxCompute’s compute resources and data.
  • Distributed operators compatible with open-source libraries (e.g., Pandas, MinHash), dramatically boosting data processing efficiency.
  • Support for distributed data processing and offline inference, enabling end-to-end Data + AI pipeline construction.
  • An out-of-the-box Python environment with support for custom Docker images, offering a more streamlined development experience.

Business Value Achieved

By adopting the MaxFrame distributed computing framework, MiniMax achieved significant improvements in resource utilization, processing efficiency, and platform architecture:

Significantly Enhanced Resource Utilization

  • Using MaxCompute’s hybrid model of “monthly reserved resources + on-demand elastic resources,” MiniMax can flexibly allocate resources according to business cycles, scaling up to hundreds of thousands of CPU cores within minutes during peak periods.
  • Compute resource utilization improved by 30%, striking an optimal balance between efficiency and cost.
  • Leveraging MaxCompute’s native hot/cold data tiering, low-access-frequency large tables are automatically moved to infrequent or archival storage, reducing historical data storage costs by 40%.

Performance Breakthroughs via Distributed Computing

  • The MaxFrame-based distributed architecture replaced legacy open-source solutions. Its built-in high-performance operators—such as optimized MinHash—significantly shortened data preprocessing time for large models.
  • For text classification tasks using the FastText model, batch inference executed on MaxCompute’s elastic CPU resources delivered markedly higher throughput.

Upgraded Data Platform Architecture, Higher Operational Efficiency

  • A unified global Data + AI processing platform—built on MaxCompute and MaxFrame—reduced self-development and maintenance costs through Alibaba Cloud’s fully managed, cloud-native PaaS capabilities, cutting operational resource investment by 50%.
  • End-to-end unified management of development, scheduling, and operations now supports efficient orchestration of multimodal data and complex AI workflows.

Conclusion and Outlook

Through deep technical collaboration with Alibaba Cloud, MiniMax has successfully built a highly efficient, cost-effective, cloud-native Data + AI integrated data processing platform centered on a modern data warehouse—effectively addressing the challenges of rapid business iteration and elastic scalability in the age of large models.

This solution not only delivers substantial gains in data processing performance and significant reductions in operational costs but also establishes a widely reusable engineering paradigm for AI application development driven by large models.

Looking ahead, MiniMax and Alibaba Cloud will continue to deepen their joint innovation in frontier areas such as large-model data preprocessing and multimodal data processing, working together to advance the large-scale industrial adoption of Data + AI technologies worldwide.

0 1 0
Share on

You may also like

Comments

Related Products