All Products
Search
Document Center

MaxCompute:MaxFrame-specific APIs

Last Updated:Mar 26, 2026

MaxFrame provides a set of APIs beyond the standard pandas interface for managing sessions, reading and writing MaxCompute tables, triggering distributed computation, and retrieving results locally.

Session

new_session

Source code: new_session

new_session(
    session_id: str = None,
    default: bool = True,
    new: bool = True,
    odps_entry: Optional[ODPS] = None
)

Creates a MaxFrame session and connects to MaxCompute.

Parameters

ParameterTypeRequiredDefaultDescription
session_idStringNoNoneA unique identifier for the session. If not specified, MaxFrame generates one automatically. When new=False, this identifies the existing session to reuse.
defaultBooleanNoTrueSets the session as the global default. When True, subsequent calls to execute() and fetch() use this session without requiring an explicit session argument.
newBooleanNoTrueCreates a new session. Set to False to connect to an existing session identified by session_id.
odps_entryODPSYesThe MaxCompute entry object. See Create a MaxCompute entry point.

Returns: The session object.

Example

import os
from maxframe import new_session
from odps import ODPS

# Initialize the MaxCompute entry object.
# Store credentials in environment variables — do not hardcode them.
o = ODPS(
    os.environ.get('ALIBABA_CLOUD_ACCESS_KEY_ID'),
    os.environ.get('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
    project='your-default-project',
    endpoint='your-endpoint',
)

# Create the MaxFrame session.
session = new_session(odps_entry=o)

Input/Output

The following functions read data from and write data to MaxCompute.

FunctionDescription
read_odps_tableReads a MaxCompute table into a DataFrame
read_odps_queryRuns a SQL query and returns results as a DataFrame
to_odps_tableWrites a DataFrame to a MaxCompute table
to_odps_modelSaves a trained XGBoost model to MaxCompute

Choosing between `read_odps_table` and `read_odps_query`: Use read_odps_table when reading from a specific table (with optional partition and column filters). Use read_odps_query when you need SQL-level filtering or joins across multiple tables.

read_odps_table

Source code: read_odps_table

read_odps_table(
    table_name: Union[str, Table],
    partitions: Union[None, str, List[str]] = None,
    columns: Optional[List[str]] = None,
    index_col: Union[None, str, List[str]] = None,
    odps_entry: ODPS = None,
    string_as_binary: bool = None,
    append_partitions: bool = False
)

Reads data from a MaxCompute table and returns it as a DataFrame. If no index columns are specified, a RangeIndex is generated.

Parameters

ParameterTypeRequiredDefaultDescription
table_nameString/TableYesThe MaxCompute table name or table object to read from.
partitionsString/ListNoNoneThe partition or list of partitions to read. Format: <partition_name>=<partition_value>. If not specified, all partitions are read.
columnsListNoNoneThe columns to read. Format: <column1>, <column2>, .... If not specified, all non-partition columns are read.
index_colString/ListNoNoneOne or more columns to use as the DataFrame index.
odps_entryODPSNoNoneThe MaxCompute entry object. See Create a MaxCompute entry point.
string_as_binaryBooleanNoNoneReads string columns in binary form.
append_partitionsBooleanNoFalseWhen True and columns is not specified, includes partition key columns in the result.

Returns: A DataFrame object.

Example

import maxframe.dataframe as md

df = md.read_odps_table(
    'BIGDATA_PUBLIC_DATASET.data_science.maxframe_ml_100k_users',
    index_col='user_id',
    columns=['age', 'sex']
)
print(df.execute().fetch())

# Output:
#          age sex
# user_id
# 1         24   M
# 2         53   F
# 3         23   M
# 4         24   M
# 5         33   F
# ...      ...  ..
# 939       26   F
# 940       32   M
# 941       20   M
# 942       48   F
# 943       22   M
#
# [943 rows x 2 columns]

read_odps_query

Source code: read_odps_query

read_odps_query(
    query: str,
    odps_entry: ODPS = None,
    index_col: Union[None, str, List[str]] = None,
    string_as_binary: bool = None
)

Runs a MaxCompute SQL query and returns the results as a DataFrame. If no index columns are specified, a RangeIndex is generated.

Parameters

ParameterTypeRequiredDefaultDescription
queryStringYesThe MaxCompute SQL statement to run.
odps_entryODPSNoNoneThe MaxCompute entry object. See Create a MaxCompute entry point.
index_colString/ListNoNoneOne or more columns to use as the DataFrame index.
string_as_binaryBooleanNoNoneReads string columns in binary form.

Returns: A DataFrame object.

Example

import maxframe.dataframe as md

df = md.read_odps_query(
    'SELECT user_id, age, sex FROM `BIGDATA_PUBLIC_DATASET.data_science.maxframe_ml_100k_users`'
)

to_odps_table

Source code: to_odps_table

to_odps_table(
    table: Union[Table, str],
    partition: Optional[str] = None,
    partition_col: Union[None, str, List[str]] = None,
    overwrite: bool = False,
    unknown_as_string: Optional[bool] = None,
    index: bool = True,
    index_label: Union[None, str, List[str]] = None,
    lifecycle: Optional[int] = None
)

Writes a DataFrame to a MaxCompute table. If the table does not exist, MaxFrame creates it automatically.

Parameters

ParameterTypeRequiredDefaultDescription
tableString/TableYesThe target table name or table object.
partitionStringNoNoneThe target partition. Example: pt1=xxx, pt2=yyy.
partition_colString/ListNoNoneDataFrame columns to use as partition key columns in the output table.
overwriteBooleanNoFalseOverwrites data if the table or partition already exists.
unknown_as_stringBooleanNoFalseWhen True, object-type columns in the DataFrame are written as STRING. An error may occur if type conversion fails.
indexBooleanNoTrueWrites the DataFrame index as a column in the output table.
index_labelString/ListNoNoneColumn name for the index. Defaults to index for a single-level index, or level_x (where x is the level of the index) for a multi-level index.
lifecycleintNoNoneLifecycle of the output table in days (positive integer). If the table already exists, this overwrites its current lifecycle setting.

Returns: A DataFrame object.

Example

import maxframe.dataframe as md

df = md.read_odps_query(
    'SELECT user_id, age, sex FROM `BIGDATA_PUBLIC_DATASET.data_science.maxframe_ml_100k_users`',
    index_col='user_id'
)
df.to_odps_table('output_table', lifecycle=7)

to_odps_model

to_odps_model(
    model_name: str,
    model_version: str = None,
    schema: str = None,
    project: str = None,
    description: Optional[str] = None,
    version_description: Optional[str] = None,
    create_model: bool = True,
    set_default_version: bool = False
)

Saves an XGBoost model trained in a MaxFrame job as a MaxCompute model object. Call .execute() on the returned Scalar to trigger the save operation.

Parameters

ParameterTypeRequiredDefaultDescription
model_nameStringYesThe model name. If project and schema are specified separately, provide only the model name. Otherwise, use the format project.schema.model_name.
model_versionStringNoNoneThe model version. If not specified, the system generates a version automatically.
schemaStringNo"default"The schema the model belongs to.
projectStringNoNoneThe project the model belongs to.
descriptionStringNoNoneA description of the model.
version_descriptionStringNoNoneA description of the model version.
create_modelBooleanNoTrueCreates the model if it does not already exist.
set_default_versionBooleanNoFalseSets the saved version as the default version of the model.

Returns: A Scalar object. Call .execute() to trigger the model saving operation.

Example

from maxframe.learn.contrib.xgboost import XGBClassifier
import maxframe.dataframe as md

# Train an XGBoost model.
X_df = md.DataFrame(X, columns=cols)
clf = XGBClassifier(n_estimators=10)
clf.fit(X_df, y)

# Save the model to MaxCompute.
clf.to_odps_model(
    model_name='my_model',
    # If project and schema are not specified separately,
    # use the format: model_name='project.schema.my_model'
    model_version='version1'
).execute()

Execute

execute

Source code: execute

execute(
    session: SessionType = None
)

Submits a data processing task to MaxCompute for execution. Because MaxFrame uses lazy execution, operations on a DataFrame are not computed until you call execute().

Parameters

ParameterTypeRequiredDefaultDescription
sessionSessionNoNoneThe session to use for execution. If not specified, the global default session created by new_session is used.

Returns: None.

Example

import maxframe.dataframe as md

df = md.read_odps_query(
    'SELECT user_id, age, sex FROM BIGDATA_PUBLIC_DATASET.data_science.maxframe_ml_100k_users',
    index_col='user_id'
)
df.execute()

Fetch

fetch

Source code: fetch

fetch(
    session: SessionType = None
)

Retrieves the computation result from MaxCompute and returns it as a pandas DataFrame or Series in your local environment. Always call execute() before fetch().

Parameters

ParameterTypeRequiredDefaultDescription
sessionSessionNoNoneThe session to use for fetching results. If not specified, the global default session created by new_session is used.

Returns: A pandas DataFrame or Series.

Example

import maxframe.dataframe as md

df = md.read_odps_query(
    'SELECT user_id, age, sex FROM `BIGDATA_PUBLIC_DATASET.data_science.maxframe_ml_100k_users`',
    index_col='user_id'
)
result = df.execute().fetch()
print(result)

# Output:
#          age sex
# user_id
# 1         24   M
# 2         53   F
# 3         23   M
# 4         24   M
# 5         33   F
# ...      ...  ..
# 939       26   F
# 940       32   M
# 941       20   M
# 942       48   F
# 943       22   M
#
# [943 rows x 2 columns]