All Products
Search
Document Center

MaxCompute:Data + AI and data science

Last Updated:Jun 17, 2026

MaxFrame is a distributed scientific computing framework developed by Alibaba Cloud. Building on PyODPS and Mars, MaxFrame provides Pandas-compatible APIs and lets you work with MaxCompute the same way you work with Python.

Background information

Python is the mainstream programming language for machine learning and AI model development. It offers rich scientific computing and visualization libraries such as NumPy (multidimensional array operations), Pandas (data analytics), Matplotlib (2D plotting), and scikit-learn (data analytics and mining algorithms), as well as training frameworks such as TensorFlow, PyTorch, XGBoost, and LightGBM.

MaxCompute provides a Python development ecosystem for large-scale data processing, analytics, mining, and model training. With unified Python APIs, you can perform data processing and mining efficiently.

Development history

The following figure shows the development history of the MaxCompute Python development ecosystem.image.png

PyODPS

PyODPS was officially released in 2015 as the MaxCompute SDK for Python. It lets you operate on MaxCompute data through Python interfaces. Over multiple iterations, PyODPS added DataFrame support with Pandas-like syntax and built-in operators for aggregation, sorting, and deduplication.

Core features of PyODPS:

  • Support for basic operations on MaxCompute objects (in 2015):

    • Access MaxCompute objects such as tables, resources, and functions.

    • Submit SQL requests by using the run_sql or execute_sql method.

    • Run Platform for AI (PAI) machine learning tasks by using the run_xflow or execute_xflow method.

    • Upload and download data by using the open_write, open_reader, or Tunnel API operations.

  • DataFrame APIs and Pandas-like interfaces for MaxCompute DataFrame computing (from 2016 to 2022):

    • PyODPS DataFrame lets you perform data operations in Python, leveraging its native language features.

    • PyODPS DataFrame provides Pandas-like interfaces with extended syntax, including MapReduce APIs for the big data environment.

    • PyODPS DataFrame provides built-in functions for aggregation, sorting, deduplication, sampling, and visualization.

Mars

The Python ecosystem offers rich scientific computing libraries such as NumPy, Pandas, and scikit-learn with convenient analytics and mining operators. However, most of these libraries are limited to standalone resources. Mars is a tensor-based distributed computing framework that implements approximately 70% of NumPy interfaces in a distributed manner, significantly reducing the difficulty of writing distributed scientific computing code.

Core features of Mars:

Compatibility and distributed capability: Mars was officially open-sourced in January 2019. It enables NumPy, Pandas, scikit-learn, and Python functions to run in a distributed manner while remaining compatible with most interfaces.

MaxFrame

Mars and PyODPS target different scenarios. Users familiar with Pandas who want to run NumPy or scikit-learn in parallel are better suited to Mars. Users who need DataFrame-based processing with high stability at terabyte scale or above are better suited to PyODPS. However, the complexity of this split architecture creates difficulties for users.

MaxFrame is a distributed scientific computing framework developed by Alibaba Cloud. It is fully compatible with Pandas interfaces and automatically selects the optimal underlying engine to run jobs, improving both performance and development efficiency. You no longer need to choose an execution engine manually, and can efficiently complete the entire workflow from data development and analytics to AI training and inference. The architecture is shown in the following diagram.

image

Core features of MaxFrame:

  • More familiar development habits

    MaxFrame is compatible with the Python ecosystem and provides unified development interfaces for MaxCompute. You can use the same Python code throughout the entire data and AI development process.

    MaxFrame can directly reference third-party libraries such as NumPy, SciPy, Pandas, and Matplotlib for scientific computing, data analysis, and visualization, reducing operation costs.

  • Higher processing performance

    MaxFrame accesses MaxCompute data directly without pulling data to your on-premises machine, eliminating data transfers and improving execution efficiency.

    MaxFrame leverages the elastic computing resources in MaxCompute with automatic distribution and parallel processing, significantly reducing data processing time.

  • More convenient development experience

    MaxFrame is integrated with MaxCompute Notebook and DataWorks. You can use MaxFrame directly in either environment without additional configuration, or install it in your on-premises environment.

    MaxFrame supports both built-in and custom images in MaxCompute, reducing environment setup time and preventing version conflicts.

  • Improved operator support

    MaxFrame is fully compatible with Pandas interfaces and automatically distributes processing, delivering powerful data processing capabilities with high computing efficiency.