MaxCompute supports custom images for SQL user-defined function (UDF), PyODPS, and MaxFrame development jobs. Attaching an image to a job gives it access to Python packages—such as pandas or scipy—that are not available in the default runtime.
Each interface specifies the image differently:
| Interface | How to specify the image |
|---|---|
| SQL UDF | set odps.session.image = <image>; flag before the query |
| PyODPS | image parameter in .execute() or .persist() |
| MaxFrame | config.options.sql.settings = {"odps.session.image": "<image>"} |
Use images in SQL UDF development
The following example uses pandas in a UDF that sums two string-encoded columns.
Prerequisites
Before you begin, ensure that you have:
-
A MaxCompute project with the necessary MaxCompute permissions
-
Access to the MaxCompute console or an SQL client
Steps
-
Write the Python UDF script and save it as
sum_pandas.py:from odps.udf import annotate import pandas as pd @annotate("string, string -> string") class SumColumns(object): def evaluate(self, arg1, arg2): # Convert input parameters to pandas DataFrame df = pd.DataFrame({'col1': arg1.split(','), 'col2': arg2.split(',')}) # Calculate the sum of two columns df['sum'] = df['col1'].astype(int) + df['col2'].astype(int) # Convert the result to a string and return result = ','.join(df['sum'].astype(str).values) return result -
Upload
sum_pandas.pyas a resource to your MaxCompute project. For details, see Add resources.ADD PY sum_pandas.py -f; -
Register the script as the
SumColumnsUDF. For details, see Create a UDF.CREATE FUNCTION SumColumns AS 'sum_pandas.SumColumns' USING 'sum_pandas.py'; -
Create the test table
testsumand insert test data:CREATE TABLE testsum (col1 string, col2 string); INSERT INTO testsum VALUES ('1,2,3','1,2,3'),('1,2,3','3,2,1'),('1,2,3','4,5,6'); -
Set the image using the
odps.session.imageflag, then run the UDF:set odps.sql.python.version=cp37; set odps.session.image = ; SELECT SumColumns(col1,col2) AS result FROM testsum;Expected output:
+------------+ | result | +------------+ | 2,4,6 | | 4,4,4 | | 5,7,9 | +------------+
Use images in PyODPS development
The following example uses the psi function from the scipy package to process a table of floating-point values.
Prerequisites
Before you begin, ensure that you have:
-
A MaxCompute project with the necessary MaxCompute permissions
-
Your AccessKey ID and AccessKey secret stored as environment variables
Steps
-
Create the test table
test_float_coland insert test data:CREATE TABLE test_float_col (col1 double); INSERT INTO test_float_col VALUES (3.75),(2.51); -
Write the PyODPS script and save it as
psi_col.py. Pass the image name to.execute()or.persist()to apply the scipy runtime:import os from odps import ODPS, options def my_psi(v): from scipy.special import psi return float(psi(v)) # If the project enables isolation, the following option is not required options.sql.settings = {"odps.isolation.session.enable": True} o = ODPS( # Ensure that the ALIBABA_CLOUD_ACCESS_KEY_ID environment variable is set to your AccessKey ID, # and the ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variable is set to your AccessKey secret. # It is not recommended to directly use the AccessKey ID and AccessKey secret strings. os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'), os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'), project='your-default-project', endpoint='your-end-point' ) df = o.get_table("test_float_col").to_df() # Execute directly and get the result df.col1.map(my_psi).execute(image='scipy') # Save to another table df.col1.map(my_psi).persist("result_table", image='scipy')Replace the following placeholders with actual values:
Placeholder Description How to get it ALIBABA_CLOUD_ACCESS_KEY_IDAccessKey ID with MaxCompute permissions AccessKey management page ALIBABA_CLOUD_ACCESS_KEY_SECRETAccessKey secret corresponding to the AccessKey ID Same page as above your-default-projectMaxCompute project name MaxCompute console > Workspace > Projects your-end-pointEndpoint for the region, e.g., http://service.cn-chengdu.maxcompute.aliyun.com/apiEndpoints -
Query the results from
result_table:SELECT * FROM result_tableExpected output:
+----------------------+ | col1 | +----------------------+ | 1.1825373886117962 | | 0.7080484451910534 | +----------------------+
Use images in MaxFrame development
The following example uses the psi function from the scipy package in a MaxFrame job. Unlike PyODPS, MaxFrame specifies the image globally through config.options.sql.settings before the session starts.
Prerequisites
Before you begin, ensure that you have:
-
A MaxCompute project with the necessary MaxCompute permissions
-
Your AccessKey ID and AccessKey secret stored as environment variables
Steps
-
Create the test table
test_float_coland insert test data:CREATE TABLE test_float_col (col1 double); INSERT INTO test_float_col VALUES (3.75),(2.51); -
Write the MaxFrame script and save it as
psi_col.py. Setodps.session.imageinconfig.options.sql.settingsbefore creating the session:import os from odps import ODPS, options from maxframe.session import new_session import maxframe.dataframe as md from maxframe.config import options from maxframe import config # Use the built-in scipy image config.options.sql.settings = { "odps.session.image": "scipy" } def my_psi(v): from scipy.special import psi return float(psi(v)) o = ODPS( # Ensure that the ALIBABA_CLOUD_ACCESS_KEY_ID environment variable is set to your AccessKey ID, # and the ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variable is set to your AccessKey secret. # It is not recommended to directly use the AccessKey ID and AccessKey secret strings. os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'), os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'), project='your-default-project', endpoint='your-end-point' ) # Create a MaxFrame session session = new_session(o) df = md.read_odps_table('test_float_col') # Execute and get the result print(df.col1.map(my_psi).execute().fetch()Replace the following placeholders with actual values:
Placeholder Description How to get it ALIBABA_CLOUD_ACCESS_KEY_IDAccessKey ID with MaxCompute permissions AccessKey management page ALIBABA_CLOUD_ACCESS_KEY_SECRETAccessKey secret corresponding to the AccessKey ID Same page as above your-default-projectMaxCompute project name MaxCompute console > Workspace > Projects your-end-pointEndpoint for the region, e.g., http://service.cn-chengdu.maxcompute.aliyun.com/apiEndpoints Expected output:
0 1.182537 1 0.708048 Name: col1, dtype: float64