All Products
Search
Document Center

MaxCompute:Scenario practices

Last Updated:Mar 26, 2026

MaxCompute supports custom images for SQL user-defined function (UDF), PyODPS, and MaxFrame development jobs. Attaching an image to a job gives it access to Python packages—such as pandas or scipy—that are not available in the default runtime.

Each interface specifies the image differently:

Interface How to specify the image
SQL UDF set odps.session.image = <image>; flag before the query
PyODPS image parameter in .execute() or .persist()
MaxFrame config.options.sql.settings = {"odps.session.image": "<image>"}

Use images in SQL UDF development

The following example uses pandas in a UDF that sums two string-encoded columns.

Prerequisites

Before you begin, ensure that you have:

  • A MaxCompute project with the necessary MaxCompute permissions

  • Access to the MaxCompute console or an SQL client

Steps

  1. Write the Python UDF script and save it as sum_pandas.py:

    from odps.udf import annotate
    import pandas as pd
    
    @annotate("string, string -> string")
    class SumColumns(object):
        def evaluate(self, arg1, arg2):
            # Convert input parameters to pandas DataFrame
            df = pd.DataFrame({'col1': arg1.split(','), 'col2': arg2.split(',')})
    
            # Calculate the sum of two columns
            df['sum'] = df['col1'].astype(int) + df['col2'].astype(int)
    
            # Convert the result to a string and return
            result = ','.join(df['sum'].astype(str).values)
            return result
  2. Upload sum_pandas.py as a resource to your MaxCompute project. For details, see Add resources.

    ADD PY sum_pandas.py -f;
  3. Register the script as the SumColumns UDF. For details, see Create a UDF.

    CREATE FUNCTION SumColumns AS 'sum_pandas.SumColumns' USING 'sum_pandas.py';
  4. Create the test table testsum and insert test data:

    CREATE TABLE testsum (col1 string, col2 string);
    INSERT INTO testsum VALUES ('1,2,3','1,2,3'),('1,2,3','3,2,1'),('1,2,3','4,5,6');
  5. Set the image using the odps.session.image flag, then run the UDF:

    set odps.sql.python.version=cp37;
    set odps.session.image = ;
    SELECT SumColumns(col1,col2) AS result FROM testsum;

    Expected output:

    +------------+
    | result     |
    +------------+
    | 2,4,6      |
    | 4,4,4      |
    | 5,7,9      |
    +------------+

Use images in PyODPS development

The following example uses the psi function from the scipy package to process a table of floating-point values.

Prerequisites

Before you begin, ensure that you have:

  • A MaxCompute project with the necessary MaxCompute permissions

  • Your AccessKey ID and AccessKey secret stored as environment variables

Steps

  1. Create the test table test_float_col and insert test data:

    CREATE TABLE test_float_col (col1 double);
    INSERT INTO test_float_col VALUES (3.75),(2.51);
  2. Write the PyODPS script and save it as psi_col.py. Pass the image name to .execute() or .persist() to apply the scipy runtime:

    import os
    from odps import ODPS, options
    
    def my_psi(v):
        from scipy.special import psi
    
        return float(psi(v))
    
    # If the project enables isolation, the following option is not required
    options.sql.settings = {"odps.isolation.session.enable": True}
    
    o = ODPS(
          # Ensure that the ALIBABA_CLOUD_ACCESS_KEY_ID environment variable is set to your AccessKey ID,
          # and the ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variable is set to your AccessKey secret.
          # It is not recommended to directly use the AccessKey ID and AccessKey secret strings.
          os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
          os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
          project='your-default-project',
          endpoint='your-end-point'
    )
    
    df = o.get_table("test_float_col").to_df()
    # Execute directly and get the result
    df.col1.map(my_psi).execute(image='scipy')
    # Save to another table
    df.col1.map(my_psi).persist("result_table", image='scipy')

    Replace the following placeholders with actual values:

    Placeholder Description How to get it
    ALIBABA_CLOUD_ACCESS_KEY_ID AccessKey ID with MaxCompute permissions AccessKey management page
    ALIBABA_CLOUD_ACCESS_KEY_SECRET AccessKey secret corresponding to the AccessKey ID Same page as above
    your-default-project MaxCompute project name MaxCompute console > Workspace > Projects
    your-end-point Endpoint for the region, e.g., http://service.cn-chengdu.maxcompute.aliyun.com/api Endpoints
  3. Query the results from result_table:

    SELECT * FROM result_table

    Expected output:

    +----------------------+
    | col1                 |
    +----------------------+
    | 1.1825373886117962   |
    | 0.7080484451910534   |
    +----------------------+

Use images in MaxFrame development

The following example uses the psi function from the scipy package in a MaxFrame job. Unlike PyODPS, MaxFrame specifies the image globally through config.options.sql.settings before the session starts.

Prerequisites

Before you begin, ensure that you have:

  • A MaxCompute project with the necessary MaxCompute permissions

  • Your AccessKey ID and AccessKey secret stored as environment variables

Steps

  1. Create the test table test_float_col and insert test data:

    CREATE TABLE test_float_col (col1 double);
    INSERT INTO test_float_col VALUES (3.75),(2.51);
  2. Write the MaxFrame script and save it as psi_col.py. Set odps.session.image in config.options.sql.settings before creating the session:

    import os
    from odps import ODPS, options
    from maxframe.session import new_session
    import maxframe.dataframe as md
    
    from maxframe.config import options
    from maxframe import config
    
    # Use the built-in scipy image
    config.options.sql.settings = {
        "odps.session.image": "scipy"
    }
    def my_psi(v):
        from scipy.special import psi
        return float(psi(v))
    
    o = ODPS(
          # Ensure that the ALIBABA_CLOUD_ACCESS_KEY_ID environment variable is set to your AccessKey ID,
          # and the ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variable is set to your AccessKey secret.
          # It is not recommended to directly use the AccessKey ID and AccessKey secret strings.
          os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
          os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
          project='your-default-project',
          endpoint='your-end-point'
    )
    
    # Create a MaxFrame session
    session = new_session(o)
    df = md.read_odps_table('test_float_col')
    
    # Execute and get the result
    print(df.col1.map(my_psi).execute().fetch()

    Replace the following placeholders with actual values:

    Placeholder Description How to get it
    ALIBABA_CLOUD_ACCESS_KEY_ID AccessKey ID with MaxCompute permissions AccessKey management page
    ALIBABA_CLOUD_ACCESS_KEY_SECRET AccessKey secret corresponding to the AccessKey ID Same page as above
    your-default-project MaxCompute project name MaxCompute console > Workspace > Projects
    your-end-point Endpoint for the region, e.g., http://service.cn-chengdu.maxcompute.aliyun.com/api Endpoints

    Expected output:

    0    1.182537
    1    0.708048
    Name: col1, dtype: float64