All Products
Search
Document Center

MaxCompute:Reference a third-party package in a PyODPS node

Last Updated:Mar 26, 2026

Use third-party Python packages—such as SciPy or python-dateutil—in PyODPS by uploading them as MaxCompute resources and referencing them in your code. For instructions on generating a package with pyodps-pack, see Generate a third-party package for PyODPS.

Prerequisites

Before you begin, make sure that you have:

Choose a method

Select the method that fits your scenario:

ScenarioRecommended method
New project, Python UDF or DataFrameUse pyodps-pack to package and upload, then reference via sys.path or the libraries parameter
DataWorks PyODPS node with built-in packagesUse the DataWorks built-in method or load_resource_package
Existing project with manually uploaded WHL filesManual upload (legacy maintenance only; use pyodps-pack for new projects)

Upload a third-party package

Before referencing a third-party package, upload it to MaxCompute as an archive resource. Use one of the following methods:

  • Upload with code. Replace packages.tar.gz with the path and name of your package file.

    import os
    from odps import ODPS
    
    # Load credentials from environment variables.
    # Avoid hardcoding your AccessKey ID or AccessKey secret in code.
    o = ODPS(
        os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
        os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
        project='<your-default-project>',
        endpoint='<your-end-point>',
    )
    o.create_resource("test_packed.tar.gz", "archive", fileobj=open("packages.tar.gz", "rb"))
  • Upload with DataWorks. See Step 1: Create a resource or upload an existing resource.

Reference a third-party package in a Python UDF

To use a third-party package in a Python user-defined function (UDF), modify the UDF class:

  1. Add the package path to sys.path in the __init__ method.

  2. Place the import statement inside the function body (the evaluate function or process method).

Important

The import statement must go inside the function body, not at the top of the file. Third-party packages are available only at runtime. When MaxCompute parses the UDF, the parsing environment does not include third-party packages, so a top-level import causes an error.

Example: use SciPy in a UDF

This example uses the psi function from SciPy in a UDF.

  1. Package SciPy.

    pyodps-pack -o scipy-bundle.tar.gz scipy
  2. Write the UDF code and save it as test_psi_udf.py.

    import sys
    from odps.udf import annotate
    
    @annotate("double->double")
    class MyPsi(object):
        def __init__(self):
            # Add the package path to sys.path.
            # MaxCompute decompresses archive resources into folders under the work/ directory.
            # The folder name matches the resource name.
            # packages/ is the subdirectory created by pyodps-pack.
            sys.path.insert(0, "work/scipy-bundle.tar.gz/packages")
    
        def evaluate(self, arg0):
            # Place the import statement inside the function body.
            from scipy.special import psi
    
            return float(psi(arg0))
  3. Upload test_psi_udf.py as a Python resource and scipy-bundle.tar.gz as an archive resource.

  4. Create the UDF, reference both resources, and set the class name to test_psi_udf.MyPsi. Do this in a PyODPS node or on the MaxCompute client.

    • In a PyODPS node:

      import os
      from odps import ODPS
      
      # Load credentials from environment variables.
      # Avoid hardcoding your AccessKey ID or AccessKey secret in code.
      o = ODPS(
          os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
          os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
          project='<your-default-project>',
          endpoint='<your-end-point>',
      )
      
      bundle_res = o.create_resource(
          "scipy-bundle.tar.gz", "archive", fileobj=open("scipy-bundle.tar.gz", "rb")
      )
      udf_res = o.create_resource(
          "test_psi_udf.py", "py", fileobj=open("test_psi_udf.py", "rb")
      )
      o.create_function(
          "test_psi_udf", class_type="test_psi_udf.MyPsi", resources=[bundle_res, udf_res]
      )
    • On the MaxCompute client:

      add archive scipy-bundle.tar.gz;
      add py test_psi_udf.py;
      create function test_psi_udf as test_psi_udf.MyPsi using test_psi_udf.py,scipy-bundle.tar.gz;
  5. Run the UDF in a SQL statement.

    set odps.pypy.enabled=false;
    set odps.isolation.session.enable=true;
    select test_psi_udf(sepal_length) from iris;

Reference a third-party package in PyODPS DataFrame

Pass the libraries parameter to the execute or persist method. The following example uses the map method; the procedure is the same for the apply and map_reduce methods.

  1. Package SciPy.

    pyodps-pack -o scipy-bundle.tar.gz scipy
  2. Run the following code to apply the package to a DataFrame operation. This example calculates psi(col1) on a table named test_float_col, which has a single column of the FLOAT type.

    import os
    from odps import ODPS, options
    
    def my_psi(v):
        from scipy.special import psi
    
        return float(psi(v))
    
    # Skip this setting if isolation is already enabled for your project.
    options.sql.settings = {"odps.isolation.session.enable": True}
    
    # Load credentials from environment variables.
    # Avoid hardcoding your AccessKey ID or AccessKey secret in code.
    o = ODPS(
        os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
        os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
        project='<your-default-project>',
        endpoint='<your-end-point>',
    )
    df = o.get_table("test_float_col").to_df()
    
    # Execute and return the result.
    df.col1.map(my_psi).execute(libraries=["scipy-bundle.tar.gz"])
    
    # Save the result to another table.
    df.col1.map(my_psi).persist("result_table", libraries=["scipy-bundle.tar.gz"])

    The input data looks like this:

       col1
    0  3.75
    1  2.51
  3. (Optional) To use the same package across all DataFrame operations in the session, set the global parameter.

    from odps import options
    options.df.libraries = ["scipy-bundle.tar.gz"]

Reference a third-party package in DataWorks

A DataWorks PyODPS node provides built-in third-party packages. To use a package that is not built in, call the load_resource_package method. For details, see Use a third-party package.

Manually upload and reference a third-party package

Follow these instructions only for existing projects that already use manually uploaded WHL dependencies, or for environments running an early MaxCompute version that does not support binary packages. For new projects, use pyodps-pack instead.

This example uses python-dateutil in the map method.

  1. Download python-dateutil and its dependencies to a local directory. Run this command in Linux to make sure the packages are compatible with the Linux operating system.

    pip download python-dateutil -d /to/path/

    Two packages are downloaded: six-1.10.0-py2.py3-none-any.whl and python_dateutil-2.5.3-py2.py3-none-any.whl.

  2. Upload the packages to MaxCompute.

    • Method 1: Use code.

      # Make sure the file name extensions are valid.
      odps.create_resource('six.whl', 'file', file_obj=open('six-1.10.0-py2.py3-none-any.whl', 'rb'))
      odps.create_resource('python_dateutil.whl', 'file', file_obj=open('python_dateutil-2.5.3-py2.py3-none-any.whl', 'rb'))
    • Method 2: Use DataWorks. See Step 1: Create a resource or upload an existing resource.

  3. Reference the packages in your code. This example parses date strings from a DataFrame column.

    • Set libraries globally:

      from odps import options
      
      def get_year(t):
          from dateutil.parser import parse
          return parse(t).strftime('%Y')
      
      options.df.libraries = ['six.whl', 'python_dateutil.whl']
      df.datestr.map(get_year).execute()

      Output:

         datestr
      0     2016
      1     2015
    • Pass libraries per call:

      def get_year(t):
          from dateutil.parser import parse
          return parse(t).strftime('%Y')
      
      df.datestr.map(get_year).execute(libraries=['six.whl', 'python_dateutil.whl'])

      Output:

         datestr
      0     2016
      1     2015

Binary package compatibility

PyODPS supports Python libraries that contain only Python code with no file operations by default. Later versions of MaxCompute also support libraries with binary code or file operations. Library names must include a platform-specific suffix.

The following table lists the supported suffixes by platform and Python version.

PlatformPython versionSupported suffix
RHEL 5 x86_64Python 2.7cp27-cp27m-manylinux1_x86_64
RHEL 5 x86_64Python 3.7cp37-cp37m-manylinux1_x86_64
RHEL 7 x86_64Python 2.7cp27-cp27m-manylinux1_x86_64, cp27-cp27m-manylinux2010_x86_64, cp27-cp27m-manylinux2014_x86_64
RHEL 7 x86_64Python 3.7cp37-cp37m-manylinux1_x86_64, cp37-cp37m-manylinux2010_x86_64, cp37-cp37m-manylinux2014_x86_64
RHEL 7 Arm64Python 3.7cp37-cp37m-manylinux2014_aarch64

All WHL packages must be uploaded to MaxCompute as archive resources. Before uploading, rename each WHL file to a ZIP file by changing its extension. Also set odps.isolation.session.enable to True for the job or your project.

The following example uploads and uses SciPy as a binary package.

# Binary packages must be uploaded as archive resources.
# Rename the .whl file to .zip before uploading.
odps.create_resource('scipy.zip', 'archive', file_obj=open('scipy-0.19.0-cp27-cp27m-manylinux1_x86_64.whl', 'rb'))

# Skip this setting if isolation is already enabled for your project.
options.sql.settings = { 'odps.isolation.session.enable': True }

def my_psi(value):
    # Place the import statement inside the function to avoid runtime errors
    # caused by structural differences in binary packages across operating systems.
    from scipy.special import psi
    return float(psi(value))

df.float_col.map(my_psi).execute(libraries=['scipy.zip'])

To package source-only binary packages into WHL files, run the following command in Linux. WHL files built on macOS or Windows cannot be used in MaxCompute.

python setup.py bdist_wheel