All Products
Search
Document Center

MaxCompute:Automated packaging service

Last Updated:Mar 26, 2026

Managing third-party Python dependencies in distributed MaxFrame jobs typically requires manually uploading packages to MaxCompute before each run. The automatic packaging service removes this step: declare your dependencies in code using with_python_requirements, and MaxFrame resolves and bundles them at runtime automatically.

Prerequisites

Before you begin, ensure that you have:

  • A MaxFrame session connected to MaxCompute

  • (If using the on-premises MaxFrame client) MaxFrame SDK version V0.1.0b5 or later. See Preparations for setup instructions.

How it works

  1. Decorate your UDF with @with_python_requirements, listing the packages your function needs.

  2. When the job runs, MaxFrame resolves the listed packages from PyPI and bundles them into the job environment.

  3. On subsequent runs, if the packaged result is already cached, MaxFrame skips repackaging.

Packaging is triggered on the first run. If the cache is cleared (temporary resources are deleted daily when force_rebuild=False), MaxFrame repackages automatically on the next run, which adds latency.

Declare dependencies with with_python_requirements

The with_python_requirements decorator is the entry point for the automatic packaging service.

def with_python_requirements(
    *requirements: str,
    force_rebuild: bool = False,
    prefer_binary: bool = False,
    pre_release: bool = False,
): ...

Parameters

`requirements` (required)

One or more dependency package specifiers, following PEP 508 syntax — the same format pip uses.

@with_python_requirements("scikit_learn>1.0", "xgboost>1.0")

`force_rebuild` (optional, default: False)

Controls whether MaxFrame repackages dependencies that are already cached.

Value Behavior
False (default) Skip repackaging if a cached result exists. The cached package is stored as a temporary resource and deleted daily.
True Always repackage using the latest PyPI image version. The result is stored as a long-term resource and is not deleted automatically.

For development and iterative testing, keep the default False. Use force_rebuild=True when you want to force an upgrade to the latest package version and have the result stored as a long-term resource.

With force_rebuild=False, the temporary resource is deleted daily. If the cache is cleared between runs, MaxFrame repackages automatically, which adds latency to the next run.

`prefer_binary` (optional, default: False)

Controls whether MaxFrame prefers pre-built binary wheel files over source distributions.

Value Behavior
False (default) No preference; pip resolves the best match normally.
True Prefer binary wheels, equivalent to passing --prefer-binary to pip.
Preferring binary wheels can speed up packaging, but the selected version may not be the latest release.

`pre_release` (optional, default: False)

Controls whether pre-release (alpha or beta) package versions are eligible for packaging.

Value Behavior
False (default) Only stable releases are packaged.
True Alpha and beta releases are included.

Example

The following example uses with_python_requirements to inject jieba, cloudpickle, and pandas into a DataFrame apply job.

import os
import maxframe.dataframe as md
from maxframe import new_session
from maxframe.udf import with_python_requirements
from odps import ODPS

# Initialize the ODPS client.
# Load credentials from environment variables — avoid hardcoding AccessKey ID
# and AccessKey secret in your code.
o = ODPS(
    os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
    os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
    project='your-default-project',
    endpoint='your-end-point',
)

session = new_session(o)

data = [["abcd"], ["efgh"], ["ijkl"], ["mno"]]
md_df = md.DataFrame(data, columns=["col1"])

# Declare dependencies. MaxFrame packages them automatically at runtime.
@with_python_requirements("jieba==0.40 cloudpickle pandas")
def process(row):
    import jieba
    row["col1"] = row["col1"] + "_" + jieba.__version__
    return row

md_result = (
    md_df.apply(
        process,
        axis=1,
        result_type="expand",
        output_type="dataframe",
        dtypes=md_df.dtypes.copy(),
    )
    .execute()
    .fetch()
)

Replace the following placeholders with your actual values:

Placeholder Description
your-default-project Your MaxCompute project name
your-end-point Your MaxCompute endpoint

FAQ

When does packaging happen?

Packaging is triggered at the start of the first job run. If the packaged result is already cached, MaxFrame skips repackaging and the job starts immediately.

What if the cached package is deleted before my next run?

With force_rebuild=False, the cached package is stored as a temporary resource and deleted daily. If it is deleted before your next run, MaxFrame repackages automatically. This adds latency to that run but does not cause the job to fail.

How do I ensure the latest package versions are used across runs?

Set force_rebuild=True. MaxFrame repackages using the latest PyPI image version and stores the result as a long-term resource that is not deleted automatically.