MaxFrame AI Function is an end-to-end solution for offline large language model (LLM) inference in MaxCompute. It combines data processing with AI capabilities, letting you run batch LLM inference directly on GPU resource quota (GU) resources without leaving your data warehouse workflow.
This topic shows you how to call a managed LLM using llm.generate on GU resources.
Quick start
The following snippet shows the minimal end-to-end call. Read the sections below for parameter details and debugging guidance.
import os
import maxframe.dataframe as md
from maxframe import new_session
from maxframe.config import options
from maxframe.learn.contrib.llm.models.managed import ManagedTextGenLLM
from odps import ODPS
# Initialize session
o = ODPS(
os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
project='<your project>',
endpoint='https://service.cn-<your region>.maxcompute.aliyun.com/api',
)
session = new_session(o)
options.session.gu_quota_name = "<your GU quota name>" # Required to route tasks to GPU workers.
# Prepare input data
df = md.DataFrame({"query": ["What is the boiling point of water?"]})
# Initialize the managed LLM client
llm = ManagedTextGenLLM(name="Qwen3-4B-Instruct-2507-FP8") # Name must be an exact match.
# Define the prompt template and run inference
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Please answer the following question: {query}"},
]
result_df = llm.generate(df, prompt_template=messages)
result_df.execute()Prerequisites
Before you begin, make sure you have:
MaxFrame software development kit (SDK) version 2.3.0 or later
Python 3.11
A GU quota enabled for your MaxCompute project — see Purchase and use MaxCompute AI computing resources
At least project-level read and write permissions for MaxCompute
Set up the environment
The following code initializes a MaxFrame session and assigns a GU quota to it. All subsequent LLM tasks run under this session.
import os
import maxframe.dataframe as md
import numpy as np
from maxframe import new_session
from maxframe.config import options
from maxframe.udf import with_running_options # Helper for attaching resource options to UDF-based tasks.
from odps import ODPS
import logging
# Route tasks to the DPE and MCSQL engines; disable SPE for GPU workloads.
options.dag.settings = {
"engine_order": ["DPE", "MCSQL"],
"unavailable_engines": ["SPE"],
}
logging.basicConfig(level=logging.INFO)
# Initialize the MaxFrame session with your project credentials.
o = ODPS(
# Make sure the ALIBABA_CLOUD_ACCESS_KEY_ID environment variable is set to your AccessKey ID,
# and the ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variable is set to your AccessKey secret.
# Do not use the AccessKey ID and AccessKey secret strings directly.
os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
project='<your project>',
endpoint='https://service.cn-<your region>.maxcompute.aliyun.com/api',
)
session = new_session(o)
options.session.gu_quota_name = "xxxxx" # Replace with your GU quota name.
print("LogView address:", session.get_logview_address())gu_quota_name is required to use GU resources. Without it, inference tasks are not dispatched to GPU workers.
Call a managed LLM
The four steps below use llm.generate to submit an asynchronous inference task against a managed model.
Step 1: Prepare input data
import pandas as pd
from IPython.display import HTML
# Set display options for debugging.
pd.set_option("display.max_colwidth", None)
pd.set_option("display.max_columns", None)
HTML("<style>div.output_area pre {white-space: pre-wrap;}</style>")
# Create a query list.
query_list = [
"What is the average distance between the Earth and the Sun?",
"In what year did the American Revolutionary War begin?",
"What is the boiling point of water?",
"How can I quickly relieve a headache?",
"Who is the main character in the Harry Potter series?",
]
# Convert to a MaxFrame DataFrame.
df = md.DataFrame({"query": query_list})
df.execute()Step 2: Initialize the LLM instance
ManagedTextGenLLM is the client for managed text generation models hosted on MaxCompute. Pass the exact model name to identify which model to use.
from maxframe.learn.contrib.llm.models.managed import ManagedTextGenLLM
llm = ManagedTextGenLLM(
name="Qwen3-4B-Instruct-2507-FP8" # The model name must be an exact match.
)A typo or version mismatch in the model name causes the task to fail. For the full list of supported models, see Supported models for MaxFrame AI Function.
Step 3: Define the prompt template
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Please answer the following question: {query}"},
]Template syntax:
Use
{column_name}as a placeholder — it is automatically replaced with the value from the corresponding DataFrame column at runtime.Pass a
messageslist to support multi-turn conversations.Use the
systemrole to define the model's behavior.
Step 4: Run the generation task
result_df = llm.generate(
df, # Input DataFrame
prompt_template=messages,
running_options={
"max_tokens": 4096, # Maximum output length
"verbose": True # Enable verbose log output
},
params={"temperature": 0.7},
)
# Execute and get the result.
result_df.execute()Output schema
result_df is a MaxFrame DataFrame with the following fields:
| Field | Type | Description |
|---|---|---|
query | string | Original input |
generated_text | string | Response generated by the model |
finish_reason | string | Completion reason: stop or length |
usage.prompt_tokens | int | Number of input tokens |
usage.completion_tokens | int | Number of output tokens |
usage.total_tokens | int | Total number of tokens |
Performance and debugging
Performance optimization
| Optimization | Recommendation |
|---|---|
| Batch size | Keep each batch to fewer than 100 items to avoid out-of-memory (OOM) errors |
| GU allocation | gu=2 is suitable for 4B models — larger models require more GU |
| Degree of parallelism | MaxFrame automatically schedules concurrent jobs; control this with num_workers |
| Cache intermediate results | Use to_odps_table() to save intermediate tables and avoid recomputation |
| Timeout | Add timeout=3600 to prevent jobs from getting stuck |
Debugging tips
View execution logs
print(session.get_logview_address()) # Click the link to view real-time MaxFrame job logs.Run a small-scale test before full execution
df_sample = df.head(2) # Use two rows for testing.
result_sample = llm.generate(df_sample, prompt_template=messages, running_options={"gu": 2})
result_sample.execute()Check resource usage
View detailed job execution status in MaxFrame Logview.