All Products
Search
Document Center

MaxCompute:Develop AI Functions on GU resources

Last Updated:Jan 14, 2026

The MaxFrame AI Function is an end-to-end solution from Alibaba Cloud MaxCompute for offline inference scenarios with Large Language Models (LLMs). It integrates data processing with AI capabilities to lower the barrier for enterprise-level LLM applications. This topic describes how to use the MaxFrame AI Function to call LLMs with GU resources.

Applicability

  • Environment preparation

    • MaxFrame software development kit (SDK) version 2.3.0 or later.

    • Python version 3.11.

    • A GPU resource quota (GU) is enabled for the MaxCompute project.

  • Permission configuration

Configure the environment

A gu_quota_name is required to use GPUs.

import os
import maxframe.dataframe as md
import numpy as np
from maxframe import new_session
from maxframe.config import options
from maxframe.udf import with_running_options
from odps import ODPS
import logging

options.dag.settings = {
    "engine_order": ["DPE", "MCSQL"],
    "unavailable_engines": ["SPE"],
}

logging.basicConfig(level=logging.INFO)

# -------------------------------
# MaxFrame Session initialization
# -------------------------------
o = ODPS(
    # Make sure the ALIBABA_CLOUD_ACCESS_KEY_ID environment variable is set to your AccessKey ID,
    # and the ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variable is set to your AccessKey secret.
    # Do not use the AccessKey ID and AccessKey secret strings directly.
    os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
    os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
    project='<your project>',
    endpoint='https://service.cn-<your region>.maxcompute.aliyun.com/api',
)

session = new_session(o)

options.session.gu_quota_name = "xxxxx" # Replace with your GU Quota Name.

print("LogView address:", session.get_logview_address())

Call a managed LLM (LLM.generate)

Step 1: Prepare input data

import pandas as pd
from IPython.display import HTML

# Set display options for debugging.
pd.set_option("display.max_colwidth", None)
pd.set_option("display.max_columns", None)
HTML("<style>div.output_area pre {white-space: pre-wrap;}</style>")

# Create a query list.
query_list = [
    "What is the average distance between the Earth and the Sun?",
    "In what year did the American Revolutionary War begin?",
    "What is the boiling point of water?",
    "How can I quickly relieve a headache?",
    "Who is the main character in the Harry Potter series?",
]

# Convert to a MaxFrame DataFrame.
df = md.DataFrame({"query": query_list})
df.execute() 

Step 2: Initialize the LLM instance

from maxframe.learn.contrib.llm.models.managed import ManagedTextGenLLM

llm = ManagedTextGenLLM(
    name="Qwen3-4B-Instruct-2507-FP8"  # The model name must be an exact match.
)

For more information about the supported models, see Supported models for MaxFrame AI Function (continuously updated).

Step 3: Define the prompt template

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Please answer the following question: {query}"},
]

Template syntax:

  • Use the {column_name} placeholder to automatically replace it with values from the corresponding fields in the DataFrame.

  • Multi-turn conversations are supported (the messages list).

  • The system prompt (system) is used to define the role's behavior.

Step 4: Run the generation task

result_df = llm.generate(
    df,# Input data
    prompt_template=messages,
    running_options={"max_tokens": 4096, # Maximum output length
                     "verbose": True # Enable verbose log output mode
                    },
    params={"temperature": 0.7},
)

# Execute and get the result.
result_df.execute()

Output description

result_df is a MaxFrame DataFrame that contains the following fields:

Field

Type

Description

query

string

Original input

generated_text

string

Response generated by the model

finish_reason

string

Reason, such as stop or length

usage.prompt_tokens

int

Number of input tokens

usage.completion_tokens

int

Number of output tokens

usage.total_tokens

int

Total number of tokens

Debugging and performance tuning tips

Performance and cost optimization

Optimization

Recommendation

Batch size

Keep each batch to < 100 items to avoid OOM

GU allocation

gu=2 is suitable for 4B models. Larger models require more GU.

Degree of parallelism

MaxFrame automatically schedules concurrent jobs. Control this with num_workers.

Cache intermediate results

Use to_odps_table() to save intermediate tables and avoid recomputation.

Timeout setting

Add timeout=3600 to prevent jobs from getting stuck.

Debugging tips

  • View execution logs

    print(session.get_logview_address())  # Click the link to view real-time MaxFrame job logs.
  • Small-scale test

    df_sample = df.head(2)  # Get two data entries for testing.
    result_sample = llm.generate(df_sample, prompt_template=messages, running_options={"gu": 2})
    result_sample.execute()
  • Check resource usage

    You can view the detailed execution status of jobs in the MaxFrame Logview.