The MaxFrame AI Function is an end-to-end solution from Alibaba Cloud MaxCompute for offline inference scenarios with Large Language Models (LLMs). It integrates data processing with AI capabilities to lower the barrier for enterprise-level LLM applications. This topic describes how to use the MaxFrame AI Function to call LLMs with GU resources.
Applicability
Environment preparation
MaxFrame software development kit (SDK) version 2.3.0 or later.
Python version 3.11.
A GPU resource quota (GU) is enabled for the MaxCompute project.
Permission configuration
The current account has at least project-level read and write permissions for MaxCompute.
You have requested and purchased a MaxCompute GU quota (gu_quota_name).
Configure the environment
A gu_quota_name is required to use GPUs.
import os
import maxframe.dataframe as md
import numpy as np
from maxframe import new_session
from maxframe.config import options
from maxframe.udf import with_running_options
from odps import ODPS
import logging
options.dag.settings = {
"engine_order": ["DPE", "MCSQL"],
"unavailable_engines": ["SPE"],
}
logging.basicConfig(level=logging.INFO)
# -------------------------------
# MaxFrame Session initialization
# -------------------------------
o = ODPS(
# Make sure the ALIBABA_CLOUD_ACCESS_KEY_ID environment variable is set to your AccessKey ID,
# and the ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variable is set to your AccessKey secret.
# Do not use the AccessKey ID and AccessKey secret strings directly.
os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
project='<your project>',
endpoint='https://service.cn-<your region>.maxcompute.aliyun.com/api',
)
session = new_session(o)
options.session.gu_quota_name = "xxxxx" # Replace with your GU Quota Name.
print("LogView address:", session.get_logview_address())Call a managed LLM (LLM.generate)
Step 1: Prepare input data
import pandas as pd
from IPython.display import HTML
# Set display options for debugging.
pd.set_option("display.max_colwidth", None)
pd.set_option("display.max_columns", None)
HTML("<style>div.output_area pre {white-space: pre-wrap;}</style>")
# Create a query list.
query_list = [
"What is the average distance between the Earth and the Sun?",
"In what year did the American Revolutionary War begin?",
"What is the boiling point of water?",
"How can I quickly relieve a headache?",
"Who is the main character in the Harry Potter series?",
]
# Convert to a MaxFrame DataFrame.
df = md.DataFrame({"query": query_list})
df.execute() Step 2: Initialize the LLM instance
from maxframe.learn.contrib.llm.models.managed import ManagedTextGenLLM
llm = ManagedTextGenLLM(
name="Qwen3-4B-Instruct-2507-FP8" # The model name must be an exact match.
)For more information about the supported models, see Supported models for MaxFrame AI Function (continuously updated).
Step 3: Define the prompt template
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Please answer the following question: {query}"},
]Template syntax:
Use the
{column_name}placeholder to automatically replace it with values from the corresponding fields in the DataFrame.Multi-turn conversations are supported (the
messageslist).The system prompt (system) is used to define the role's behavior.
Step 4: Run the generation task
result_df = llm.generate(
df,# Input data
prompt_template=messages,
running_options={"max_tokens": 4096, # Maximum output length
"verbose": True # Enable verbose log output mode
},
params={"temperature": 0.7},
)
# Execute and get the result.
result_df.execute()Output description
result_df is a MaxFrame DataFrame that contains the following fields:
Field | Type | Description |
| string | Original input |
| string | Response generated by the model |
| string | Reason, such as |
| int | Number of input tokens |
| int | Number of output tokens |
| int | Total number of tokens |
Debugging and performance tuning tips
Performance and cost optimization
Optimization | Recommendation |
Batch size | Keep each batch to |
GU allocation |
|
Degree of parallelism | MaxFrame automatically schedules concurrent jobs. Control this with |
Cache intermediate results | Use |
Timeout setting | Add |
Debugging tips
View execution logs
print(session.get_logview_address()) # Click the link to view real-time MaxFrame job logs.Small-scale test
df_sample = df.head(2) # Get two data entries for testing. result_sample = llm.generate(df_sample, prompt_template=messages, running_options={"gu": 2}) result_sample.execute()Check resource usage
You can view the detailed execution status of jobs in the MaxFrame Logview.