All Products
Search
Document Center

MaxCompute:MaxFrame: The distributed AI compute engine

Last Updated:Mar 02, 2026

Function introduction

MaxFrame is a distributed computing framework from Alibaba Cloud MaxCompute that provides a Python programming interface. It addresses two key challenges in traditional Python data processing: performance bottlenecks and inefficient data movement. With MaxFrame, you can directly process and analyze petabyte-scale big data on MaxCompute. You can perform visual data exploration and analytics, scientific computing, machine learning, and AI development—meeting the growing demand for efficient big data processing and AI development within the Python ecosystem.

Use cases

Interactive data exploration

MaxFrame delivers a smooth, memory-unlimited experience. You can explore, manipulate, and visualize massive datasets in real time, just as you would in a local Jupyter Notebook.

Large-scale data preprocessing (ETL)

For multi-terabyte raw data cleansing, format conversion, feature engineering, and other tasks, you can replace complex SQL+UDF logic with more expressive and maintainable Python code—while benefiting from the high performance of distributed execution.

AI and machine learning

In the model development workflow, MaxFrame unifies data processing and model training. Use it to efficiently prepare training data and combine it with the image feature to import libraries such as Scikit-learn and XGBoost, enabling end-to-end AI workflows.

Usage notes

Supported regions

China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Hong Kong), Japan (Tokyo), Singapore, Indonesia (Jakarta), Germany (Frankfurt), US (Silicon Valley), and US (Virginia).

Supported environments

  • Local Python development environment.

  • MaxCompute Notebook.

  • DataWorks Notebook.

  • DataWorks Data Development PyODPS 3 task nodes.

Billing

MaxFrame billing is based on compute resource usage per job. It supports both and subscription billing methods.

  • Subscription: Jobs consume the quota of purchased resource groups with no additional charges.

For more information, see Analyze MaxCompute bill and usage details.

Core advantages

Compared to other Python development tools, MaxFrame better aligns with familiar development habits, enables more efficient data processing, provides more elastic computing resources, and delivers a more convenient development experience.

  • Pandas-compatible API: MaxFrame provides an API highly compatible with Pandas. This supports the smooth migration of existing code to the MaxCompute platform and significantly reduces learning and migration costs.

  • Server-side distributed execution: MaxFrame jobs run directly within the MaxCompute cluster. Data does not need to be pulled to a local machine. This eliminates performance bottlenecks caused by insufficient client memory and enables efficient processing of petabyte-scale data.

  • Elastic computing resources: MaxFrame relies on the MaxCompute serverless architecture to allocate computing resources on demand. This lets you process data tasks of any scale without managing a cluster.

  • Simplified development environment: MaxFrame provides built-in Python 3.7 and Python 3.11 environments with pre-installed common libraries such as Pandas and XGBoost. Manage third-party dependencies with simple annotations. This simplifies environment configuration and dependency management. It is more convenient than manually packaging and uploading user-defined function (UDF) dependencies.

The following table compares this tool with other Python development tools:

Comparison Item

MaxFrame

PyODPS

Mars

SQL+UDF

Development API

Compatible with Pandas.

Syntax and API differ significantly from Pandas DataFrame.

Requires using two sets of APIs: SQL and Python.

Data processing

At runtime, data is processed on the server and does not need to be pulled to a local machine. This reduces unnecessary local data transfer and improves job execution efficiency.

The to_pandas method in PyODPS reads data, which requires the data to be pulled locally for computation.

Distributed execution is supported for only some operators.

Cluster creation is required during initialization, which is slow and offers low stability.

Supports distributed jobs based on MaxCompute SQL capabilities.

Computing resources

Not limited by the size of local resources, breaking the single-machine performance bottleneck of Python.

Limited by the size of local resources.

Limited by resource size. The size of workers, CPU, and memory must be specified.

Enables elastic computing for SQL jobs based on the MaxCompute serverless capabilities.

Development experience

Out-of-the-box interactive development environment and offline scheduling capabilities. Common libraries are built-in. Manage third-party dependencies using annotations, with no need for manual packaging.

Out-of-the-box interactive development environment and offline scheduling capabilities.

Requires preparing the corresponding runtime environment and launching a Mars cluster.

Dependency packages for Python UDFs must be manually packaged and uploaded.

How it works

MaxFrame keeps the complexity of distributed computing transparent to the user. The automated workflow is as follows:

  1. Code submission: Write and execute Python code on a client, such as a Notebook. The MaxFrame software development kit (SDK) captures the code and submits it to MaxCompute.

  2. Parsing and optimization: After the MaxCompute execution engine receives the job, it performs syntax parsing and logical optimization. It then transforms the job into a physical plan that can be executed in parallel.

  3. Distributed execution: The optimized task is distributed to numerous compute nodes in the MaxCompute cluster. The nodes directly read the data and perform parallel computing.

  4. Result return: After the computation is complete, the results are aggregated and returned to your client.

References