MaxFrame supports two ways to bring external dependencies into your Python jobs: upload a third-party package as an Archive resource, or specify a built-in image that already contains the libraries you need.
Use this table to choose the right approach:
| Approach | When to use |
|---|---|
| Third-party package | You need a specific library version or a custom package not available in the default environment. Upload the package once, then reference it per function with @with_resource_libraries. |
| Image | Your job depends on multiple pre-installed libraries (for example, the built-in scipy image). Set the image once at the session level and all functions in that job use it automatically. |
Prerequisites
Before you begin, ensure that you have:
-
A MaxCompute project with the necessary permissions to create resources
-
The local MaxFrame client installed and configured
-
Your AccessKey ID and AccessKey secret stored as environment variables:
ALIBABA_CLOUD_ACCESS_KEY_IDandALIBABA_CLOUD_ACCESS_KEY_SECRET
Use a third-party package
Step 1: Upload the package
Upload your package to MaxCompute as an Archive resource before referencing it in code. Two methods are available.
Method 1: Upload using the ODPS SDK
import os
from odps import ODPS
o = ODPS(
os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
project='<your-project>',
endpoint='<your-endpoint>',
)
# Replace packages.tar.gz with the path and filename of your package.
o.create_resource("packages.tar.gz", "archive", fileobj=open("packages.tar.gz", "rb"))
| Placeholder | Description |
|---|---|
<your-project> |
Your MaxCompute project name |
<your-endpoint> |
Your MaxCompute endpoint URL |
For more information about packaging a third-party library for upload, see Create a third-party package for PyODPS.
Method 2: Upload using DataWorks
Step 2: Reference the package in your code
Import with_resource_libraries from maxframe.udf and apply it as a decorator to functions that use the package.
from maxframe.udf import with_resource_libraries
@with_resource_libraries("packages.tar.gz", "demo.py")
def my_function(v):
# Import the library from the referenced package.
import some_library
return some_library.process(v)
Pass the resource filename (as uploaded to MaxCompute) as the argument. The decorator accepts multiple resource names, including .py files. For more details on automatic packaging, see Automatic packaging service.
Example: Calculate PSI with a third-party package
This example uses packages.tar.gz to calculate the Population Stability Index (PSI) for a column in a test table.
You can download the sample package from packages.tar.gz.
1. Create the test table and insert sample data.
CREATE TABLE test_float_col (col1 double);
INSERT INTO test_float_col VALUES (3.75),(2.51);
2. Write the MaxFrame job and save it as `demo.py`.
import os
from odps import ODPS, options
from maxframe.session import new_session
import maxframe.dataframe as md
from maxframe.config import options
from maxframe import config
from maxframe.udf import with_resource_libraries
# Reference the third-party package.
@with_resource_libraries("packages.tar.gz")
def my_psi(v):
from scipy.special import psi
return float(psi(v))
o = ODPS(
os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
project='<your-project>',
endpoint='<your-endpoint>',
)
# Create a MaxFrame session.
session = new_session(o)
df = md.read_odps_table('test_float_col')
# Run the UDF and print results.
print(df.col1.map(my_psi).execute().fetch())
3. Run the script using the local MaxFrame client.
python demo.py
Expected output:
0 1.182537
1 0.708048
Name: col1, dtype: float64
Use an image
Set odps.session.image in config.options.sql.settings to specify a built-in image for the job. All functions in the session run in that image's environment.
Example: Calculate PSI with the scipy image
This example uses the built-in scipy image to calculate the PSI for a column in a test table.
1. Create the test table and insert sample data.
CREATE TABLE test_float_col (col1 double);
INSERT INTO test_float_col VALUES (3.75),(2.51);
2. Write the MaxFrame job and save it as `demo.py`.
import os
from odps import ODPS, options
from maxframe.session import new_session
import maxframe.dataframe as md
from maxframe.config import options
from maxframe import config
# Specify the built-in scipy image for this session.
config.options.sql.settings = {
"odps.session.image": "scipy"
}
def my_psi(v):
from scipy.special import psi
return float(psi(v))
o = ODPS(
os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
project='<your-project>',
endpoint='<your-endpoint>',
)
# Create a MaxFrame session.
session = new_session(o)
df = md.read_odps_table('test_float_col')
# Run the function and print results.
print(df.col1.map(my_psi).execute().fetch())
3. Run the script using the local MaxFrame client.
python demo.py
Expected output:
0 1.182537
1 0.708048
Name: col1, dtype: float64
What's next
-
Automatic packaging service — learn how MaxFrame packages dependencies automatically
-
Create a third-party package for PyODPS — prepare a package for upload
-
Create and use MaxCompute resources — manage resources via DataWorks