MaxCompute allows you to reference third-party packages in Python user-defined functions
(UDFs), such as NumPy packages, third-party packages that need to be compiled, and
third-party packages that are dependent on dynamic-link libraries (DLLs). This topic
describes how to reference third-party packages in Python UDFs.
Background information
You can use one of the following methods to reference third-party packages in a Python
UDF:
- Reference NumPy packages in Python 3 UDFs
You must change the file name extension of the NumPy package, use the MaxCompute client
to upload the NumPy package, and then register the package. After you register the
package, a UDF is created. You can call the UDF after you create the UDF.
- Reference third-party packages that need to be compiled
You must compile the setup.py script in a third-party package, generate a wheel package,
and then change the file name extension of the wheel package in an environment that
is compatible with MaxCompute. Then, use the MaxCompute client to upload the wheel
package and register the package. After you register the package, a UDF is created.
You can call the UDF after you create the UDF. We recommend that you use a Linux operating
system. If you use a Windows operating system, we recommend that you use Docker.
- Reference third-party packages that are dependent on DLLs
You must compile the .so library file based on the source code of a third-party package,
generate a wheel package, and then change the file name extension of the wheel package.
Then, use the MaxCompute client to upload the wheel package and the .so library file
and register the package and the file. After you register the package and the file,
a UDF is created. You can call the UDF after you create the UDF.
Prerequisites
Make sure that the following prerequisites are met:
- Python is installed. We recommend that you install Python 3.
- The MaxCompute client is installed and configured. For more information, see Install and configure the MaxCompute client.
- pip, setuptools, and wheel are installed if you want to use Python UDFs to reference
third-party packages that need to be compiled. You can run the
pip install setuptools
command to install setuptools and run the pip install wheel
command to install wheel.
- PROJ 6 is installed if you use a third-party package of GDAL 3.0 or later.
- Docker is installed if you use Docker to compile third-party packages. For more information,
see Docker documentation.
Reference NumPy packages in Python 3 UDFs
You can use Python 3 in MaxCompute to reference NumPy packages. By default, the NumPy
library is installed in Python 2 in MaxCompute. You do not need to manually upload
NumPy packages in Python 2. To reference a NumPy package in Python 3 UDFs, perform
the following steps:
- In the Download files section of the PyPI page, click the package whose name ends with cp37-cp37m-manylinux1_x86_64.whl to download the package. In this example, NumPy 1.19.2 is used.

Note If you download a package whose name ends with other characters, the operation may
fail. If you need to select another version of the NumPy package, click
Release history in the
Navigation section in the upper-left corner of the
PyPI page to view the historical versions.
- Change the file name extension of the downloaded NumPy package to .zip.
Example: numpy-1.19.2-cp37-cp37m-manylinux1_x86_64.zip.
- Use the MaxCompute client to upload the NumPy package to your MaxCompute project. For more information about
how to upload the package, see Resource operations.
Sample command:
ADD ARCHIVE D:\Downloads\numpy-1.19.2-cp37-cp37m-manylinux1_x86_64.zip -f;
- Write a Python UDF script and save it as a PY file.
In this example, the saved file is named import_numpy.py. The following code shows
the Python UDF script:
from odps.udf import annotate
@annotate("->string")
class TryImport(object): # The class name is TryImport.
def __init__(self):
import sys
sys.path.insert(0, 'work/numpy-1.19.2-cp37-cp37m-manylinux1_x86_64.zip') # The NumPy package. You need only to change the package name after work/.
def evaluate(self):
import numpy
return "import succeed"
- Use the MaxCompute client to upload the import_numpy.py script to your MaxCompute project as a resource.
Sample command:
ADD PY D:\Desktop\import_numpy.py -f;
- Use the uploaded import_numpy.py script and NumPy package to create a UDF on the MaxCompute client. For more information about how to create a UDF, see Function operations.
In this example, the created UDF is named numpy. Sample command:
CREATE FUNCTION numpy AS 'import_numpy.TryImport' USING 'doc_test_dev/resources/import_numpy.py,numpy-1.19.2-cp37-cp37m-manylinux1_x86_64.zip';
Note When you create a UDF, you must add a NumPy package, such as numpy-1.19.2-cp37-cp37m-manylinux1_x86_64.zip, to the resource list.
- After you register the UDF, you can call the UDF in SQL statements. Make sure that
Python 3 is enabled to execute SQL statements. For more information, see Python 3 UDFs.
Reference third-party packages that need to be compiled
If a third-party package is a TAR.GZ package that is downloaded from PyPI or a source code package that is downloaded from GitHub, the setup.py file may be
stored in the root directory of the decompressed third-party package. To use this
type of third-party package, you must compile the setup.py file and generate a wheel
package in an environment that is compatible with MaxCompute. Then, upload the package
as a resource and register the package as a UDF. After the UDF is created, you can
call third-party packages in Python UDFs. For more information about how to upload
a resource and create a UDF, see Reference NumPy packages in Python 3 UDFs.
Notice
- Third-party packages run in a Linux operating system. We recommend that you compile
third-party packages in a Linux operating system. If you compile third-party packages
in a Windows operating system, compatibility issues may occur.
- If you use a Windows operating system, we recommend that you use Python of the required
version to compile the setup.py file and generate a wheel package in the Docker container created from the quay.io/pypa/manylinux2010_x86_64 image. Python of the
required version is stored in
/opt/python/cp27-cp27m/bin/python
or /opt/python/cp37-cp37m/bin/python3
.
If you use a Linux operating system, make sure that the following requirements are
met:
If all requirements are met, perform the following steps to compile the setup.py file
and generate a wheel package:
- Decompress a third-party package to your on-premises machine and run the required
command in the CLI to go to the path where the setup.py file is stored.
For example, the
GDAL-3.2.0.zip package is downloaded. After you decompress the package, the setup.py file is stored
in
D:\Downloads\GDAL-3.2.0. Sample command:
cd D:\Downloads\GDAL-3.2.0

- Run the following command in the CLI to check whether bdist_wheel is returned:
Sample command:
python setup.py --help-command
- If yes, go to Step 3.
- If no, change
from distutils.core import setup
to from setuptools import setup
in the setup.py file. Then, go to Step 3.
- Run the following command in the CLI to compile the setup.py file and generate a wheel
package:
python setup.py bdist_wheel
Note The wheel package is stored in the dist folder.
Reference third-party packages that are dependent on DLLs
Some third-party packages for Python depend on Python libraries and other DLLs. This
section describes how to use the
Docker container to compile the .so library file and generate a wheel package that can be
used in MaxCompute. The container is created from the quay.io/pypa/manylinux2010_x86_64
image. GDAL 3.0.4 is used in this example. You must upload the generated .so library
file, wheel package, or NumPy package as resources and register the file and the packages
as a UDF. After the UDF is created, you can call third-party packages in Python UDFs.
For more information about how to upload a resource and create a UDF, see
Reference NumPy packages in Python 3 UDFs.
Note Make sure that Docker is installed before you can reference third-party packages that
are dependent on DLLs in Python UDFs. For more information, see
Docker documentation.
To reference third-party packages that are dependent on DLLs in Python UDFs, perform
the following steps:
- View the dependencies in the Dependencies section of the PyPI page.
The following figure shows the dependencies of GDAL 3.0.4.

Note In the preceding figure, the dependencies include
libgdal and
numpy. To obtain
libgdal, compile the GDAL source code in the Docker container. To obtain
numpy, obtain the NumPy package on the
PyPI page or from the Docker container.
- Obtain the NumPy package.
You can use one of the following methods to obtain the NumPy package:
- In the Download files section of the PyPI page, click the package whose name ends with cp37-cp37m-manylinux1_x86_64.whl to download the package.
Note If Python 2 is used, perform the following operations to download the NumPy package:
In the
Navigation section of the
PyPI page, click
Release history, select 1.16.6 or an earlier version, and then click the package whose name ends
with
cp27-cp27m-manylinux1_x86_64.whl.
- Run the
/opt/python/cp37-cp37m/bin/pip download numpy -d ./
command to download the Numpy package to the current directory.
- Compile the .so library file.
- Download the GDAL 3.0.4 source code file and decompress it to your on-premises machine.
- Download the Docker container created from the quay.io/pypa/manylinux2010_x86_64 image
and enter the input mode of the Docker client.
Sample commands:
docker pull quay.io/pypa/manylinux2010_x86_64
docker run -it quay.io/pypa/manylinux1_x86_64 /bin/bash
- Upload the GDAL 3.0.4 source code to the Docker container.
Sample command:
docker cp ./gdal-3.0.4 <CONTAINER ID>:/opt/source/
For more information about how to obtain
CONTAINER ID, see
docker ps.
- Compile GDAL 3.0.4 source code in the container. For more information, see BuildingOnUnix.
Sample commands:
# Specify the directory to install PROJ 6 in the configure field.
./configure --prefix=/path/to/install/prefix --with-proj=/path/to/install/proj6/prefix
make
make install
export PATH=/path/to/install/prefix/bin:$PATH
export LD_LIBRARY_PATH=/path/to/install/prefix/lib:$LD_LIBRARY_PATH
export GDAL_DATA=/path/to/install/prefix/share/gdal
# Test
gdalinfo --version
The following errors may occur during compilation:
configure: error: PROJ 6 symbols not found
: If this error occurs, install PROJ 6 to support GDAL 3.0 or later.
fatal error: zlib.h: No such file or directory
: If this error occurs, use the yum install zlib-devel
command instead.
- Run the Docker download commands to download two .so library files (not symbolic links)
to your on-premises machine. Obtain libgdal.so from the lib folder in the installation directory of GDAL and libproj.so from the lib folder in the installation directory of PROJ 6.
- Generate a GDAL wheel package in the Docker container. For more information, see BuildingOnUnix.
Sample commands:
# If NumPy is required, install NumPy first.
/opt/python/cp37-cp37m/bin/pip install numpy
# Switch to the directory in which GDAL source code is saved.
cd swig/python
# Generate a wheel package and save it in the dist folder. Example: GDAL-3.0.4-cp37-cp37m-linux_x86_64.whl
/opt/python/cp37-cp37m/bin/python setup.py bdist_wheel
- Upload the generated .so library file, wheel package, or NumPy package as resources
and register the file and the packages as a UDF. After the UDF is created, you can
call third-party packages in Python UDFs. For more information about how to upload
a resource and create a UDF, see Reference NumPy packages in Python 3 UDFs.
Take note of the following items when you upload a resource and create a UDF:
- When you upload resources, you must upload libgdal.so and libproj.so as file resources and numpy-1.19.2-cp37-cp37m-manylinux1_x86_64.zip and GDAL-3.0.4-cp37-cp37m-linux_x86_64.zip as archive resources.
- When you create functions, you must add libgdal.so, libproj.so, numpy-1.19.2-cp37-cp37m-manylinux1_x86_64.zip, and GDAL-3.0.4-cp37-cp37m-linux_x86_64.zip to the resource list of the functions.
Sample code for a Python UDF:
# coding: utf-8
from odps.udf import annotate
from odps.distcache import get_cache_file
def include_file(file_name):
import os, sys
so_file = get_cache_file(file_name, 'b')
with open(so_file.name, 'rb') as fp:
content=fp.read()
so = open(file_name, "wb")
so.write(content)
so.flush()
so.close()
@annotate("->string")
class TryImport(object):
def __init__(self):
import sys
include_file('libgdal.so.26')
include_file('libproj.so.15')
sys.path.insert(0, 'work/GDAL-3.0.4-cp37-cp37m-linux_x86_64.zip') # The GDAL package after compilation. You need only to change the package name that follows work/.
sys.path.insert(0, 'work/numpy-1.19.2-cp37-cp37m-manylinux1_x86_64.zip') # The NumPy package. You need only to change the package name after work/.
def evaluate(self):
from osgeo import gdal
from osgeo import ogr
from osgeo import osr
from osgeo import gdal_array
from osgeo import gdalconst
return "import succeed"
Note If an error that indicates libgdal.so.26 or libproj.so.15 cannot be found occurs, you must change libgdal.so to libgdal.so.26 or libproj.so to libproj.so.15.