All Products
Search
Document Center

MaxCompute:Python 3 UDFs

Last Updated:Mar 26, 2026

MaxCompute supports user-defined functions (UDFs) written in Python 3, letting you extend SQL with custom business logic. When you call a UDF, MaxCompute passes the function name and arguments to the Python runtime. The runtime executes your evaluate method and returns the result to your query.

Quick start

The following minimal example adds two integers and handles NULL inputs:

from odps.udf import annotate

@annotate("bigint,bigint->bigint")
class MyPlus(object):
    def evaluate(self, arg0, arg1):
        if None in (arg0, arg1):
            return None
        return arg0 + arg1

To run a Python 3 UDF, add the following session flag before your SQL statement:

SET odps.sql.python.version=cp37;
SELECT my_plus(col_a, col_b) FROM my_table;

UDF code structure

Every Python 3 UDF requires four components:

ComponentDescription
Module importfrom odps.udf import annotate imports the @annotate decorator used to declare the function signature. To reference files or tables inside UDF code, also import from odps.distcache import get_cache_file or from odps.distcache import get_cache_table.
Function signature@annotate(<signature>) declares the input and return types. MaxCompute validates type consistency during semantic parsing and returns an error if types do not match. See Function signatures and data types.
Custom Python classThe class is the organizational unit of your UDF. It defines the variables and methods that implement your business logic. Classes can also reference third-party libraries pre-installed in MaxCompute, or external files and tables. See Third-party libraries and Reference resources.
`evaluate` methodDefined inside the class, evaluate specifies the input parameters and return value of the UDF. Each class can have only one evaluate method.

Limitations

Internet access (enforced at runtime)

UDFs cannot access the internet by default. To enable internet access, submit a Network Connection Request Form. The MaxCompute technical support team will contact you to complete the setup. For instructions, see Network Connection Request FormNetwork Access Process.

VPC access (enforced at runtime)

UDFs cannot access virtual private clouds (VPCs) by default. To access VPC resources from a UDF, first create a network connection between your MaxCompute project and the target VPC. For more information, see Access resources in a VPC using a UDF.

Reading table data (enforced at runtime)

UDFs, user-defined aggregate functions (UDAFs), and user-defined table-valued functions (UDTFs) cannot read data from the following table types:

  • Tables with modified schemas (Schema Evolution)

  • Tables that contain complex data types

  • Tables that contain the JSON data type

  • Transactional tables

Usage notes

Python 2 and Python 3 are not compatible. Do not mix Python 2 and Python 3 UDFs in the same SQL statement.

Python 2 reached end of life (EOL) in early 2020. For guidance on migrating existing UDFs, see Migrate Python 2 UDFs.

NULL handling

Handle NULLs explicitly inside your evaluate method:

def evaluate(self, arg):
    if arg is None:
        return None
    return arg.upper()

Develop a UDF

MaxCompute supports UDF development with MaxCompute Studio, DataWorks, and the MaxCompute client (odpscmd). All three tools follow the same workflow:

  1. Write UDF code

  2. Upload the Python file and register the function

  3. Call the UDF in SQL

The following sections walk through the workflow for each tool, using the same example function GetUrlChar that extracts a URL segment by position.

Use MaxCompute Studio

Prerequisites

Before you begin, ensure that you have:

Write UDF code

  1. In the Project panel, right-click scripts under the MaxCompute script module and choose New > MaxCompute Python.

  2. In the Create new MaxCompute python class dialog, enter a class name in Name, select python UDF from the Kind drop-down list, and click OK.

  3. Write your UDF code in the editor. Example:

    For local UDF testing, see Test UDFs.
    from odps.udf import annotate
    
    @annotate("string,bigint->string")
    class GetUrlChar(object):
    
        def evaluate(self, url, n):
            if n == 0:
                return ""
            try:
                index = url.find(".htm")
                if index < 0:
                    return ""
                a = url[:index]
                index = a.rfind("/")
                b = a[index + 1:]
                c = b.split("-")
                if len(c) < n:
                    return ""
                return c[-n]
            except Exception:
                return "Internal error"

Upload the file and register the function

Right-click the Python file in the scripts folder and select Deploy to server.... In the Submit resource and register function dialog, enter the function name and click OK. For details, see Upload a Python program and create a MaxCompute UDF.

Call the UDF

In the Project Explore tab, right-click your MaxCompute project, select Open Console, and run:

SET odps.sql.python.version=cp37;
SELECT UDF_GET_URL_CHAR("http://www.taobao.com/a.htm", 1);

Result:

+-----+
| _c0 |
+-----+
|  a  |
+-----+

Use DataWorks

Prerequisites

Before you begin, ensure that you have activated DataWorks and associated a DataWorks workspace with your MaxCompute project. For setup instructions, see DataWorks.

Write UDF code

Write the UDF code in any Python editor. Example:

from odps.udf import annotate

@annotate("string,bigint->string")
class GetUrlChar(object):

    def evaluate(self, url, n):
        if n == 0:
            return ""
        try:
            index = url.find(".htm")
            if index < 0:
                return ""
            a = url[:index]
            index = a.rfind("/")
            b = a[index + 1:]
            c = b.split("-")
            if len(c) < n:
                return ""
            return c[-n]
        except Exception:
            return "Internal error"

Upload the file and register the function

Upload the packaged code in the DataWorks console and create the UDF. See:

Call the UDF

Create an ODPS SQL node in the DataWorks console, then run:

SET odps.sql.python.version=cp37;
SELECT UDF_GET_URL_CHAR("http://www.taobao.com/a.htm", 1);

For more information about ODPS SQL nodes, see Develop a MaxCompute SQL task.

Use the MaxCompute client (odpscmd)

Prerequisites

Before you begin, ensure that you have downloaded, installed, and configured the MaxCompute client (odpscmd). For setup instructions, see MaxCompute client (odpscmd).

Write UDF code

Write the UDF code in any Python editor. Example:

from odps.udf import annotate

@annotate("string,bigint->string")
class GetUrlChar(object):

    def evaluate(self, url, n):
        if n == 0:
            return ""
        try:
            index = url.find(".htm")
            if index < 0:
                return ""
            a = url[:index]
            index = a.rfind("/")
            b = a[index + 1:]
            c = b.split("-")
            if len(c) < n:
                return ""
            return c[-n]
        except Exception:
            return "Internal error"

Upload the file and register the function

Upload the Python file and register the UDF using the following commands:

Call the UDF

Run the following SQL in the client:

SET odps.sql.python.version=cp37;
SELECT UDF_GET_URL_CHAR("http://www.taobao.com/a.htm", 1);

Third-party libraries

The built-in Python 3 runtime in MaxCompute does not include NumPy. To use NumPy, manually upload the NumPy wheel package. Download the package from PyPI or a mirror — the filename follows the pattern numpy-<version>-cp37-cp37m-manylinux1_x86_64.whl.

For upload instructions, see Resource operations or Use third-party packages in Python UDFs.

For a full list of standard libraries available in the Python 3.7 runtime, see The Python Standard Library.

Function signatures and data types

Before you write your UDF code, decide:

  • Which input types your function accepts and which type it returns

  • How your function handles NULL inputs (MaxCompute can pass NULLs to any UDF)

The function signature uses the @annotate decorator:

@annotate(<signature>)

The signature string format is:

'arg_type_list -> type'

Input types (`arg_type_list`)

Separate multiple input types with commas. The following types are supported:

BIGINT, STRING, DOUBLE, BOOLEAN, DATETIME, DECIMAL, FLOAT, BINARY, DATE, DECIMAL(precision,scale), CHAR, VARCHAR, and complex types ARRAY, MAP, STRUCT (including nested complex types).

Two special values for arg_type_list:

ValueMeaning
*Accepts any number of arguments
'' (empty string)Accepts no arguments

Return type (`type`)

UDFs return a single column. The supported return types are:

BIGINT, STRING, DOUBLE, BOOLEAN, DATETIME, DECIMAL, FLOAT, BINARY, DATE, DECIMAL(precision,scale), and complex types ARRAY, MAP, STRUCT (including nested complex types).

The available types depend on the MaxCompute data type edition used by your project. For details, see Data type editions.

Signature examples

SignatureDescription
'bigint,double->string'Takes BIGINT and DOUBLE inputs, returns STRING
'*->string'Takes any number of inputs, returns STRING
'->double'Takes no inputs, returns DOUBLE
'array<bigint>->struct<x:string, y:int>'Takes ARRAY\<BIGINT\>, returns STRUCT\<x:STRING, y:INT\>
'->map<bigint, string>'Takes no inputs, returns MAP\<BIGINT, STRING\>

MaxCompute SQL to Python 3 type mappings

Write your UDF code using these type mappings to ensure consistency:

MaxCompute SQL typePython 3 type
BIGINTINT
STRINGUNICODE
DOUBLEFLOAT
BOOLEANBOOL
DATETIMEDATETIME.DATETIME
FLOATFLOAT
CHARUNICODE
VARCHARUNICODE
BINARYBYTES
DATEDATETIME.DATE
DECIMALDECIMAL.DECIMAL
ARRAYLIST
MAPDICT
STRUCTCOLLECTIONS.NAMEDTUPLE

Reference resources

Reference files or tables inside UDF code using the odps.distcache module.

Reference a file

odps.distcache.get_cache_file(resource_name, mode) returns the content of a file resource.

ParameterDescription
resource_nameName of an existing file resource in your MaxCompute project. Returns an error if the name is invalid or the resource does not exist.
modeOpen mode. 't' (default) for text, 'b' for binary.

The return value is a file-like object. Call close() on it when done to release the file handle.

from odps.udf import annotate
from odps.distcache import get_cache_file

@annotate('bigint->string')
class DistCacheExample(object):
    def __init__(self):
        cache_file = get_cache_file('test_distcache.txt')
        kv = {}
        for line in cache_file:
            line = line.strip()
            if not line:
                continue
            k, v = line.split()
            kv[int(k)] = v
        cache_file.close()
        self.kv = kv

    def evaluate(self, arg):
        return self.kv.get(arg)

Reference a table

odps.distcache.get_cache_table(resource_name) returns the content of a table resource.

ParameterDescription
resource_nameName of an existing table resource in the current MaxCompute project. Returns an exception if the name is invalid or the resource does not exist.

The return value is a generator. A record of the ARRAY type is obtained each time the caller traverses the table. Supported column types: BIGINT, STRING, DOUBLE, BOOLEAN, DATETIME, FLOAT, CHAR, VARCHAR, BINARY, DATE, DECIMAL, ARRAY, MAP, STRUCT.

from odps.udf import annotate
from odps.distcache import get_cache_table

@annotate('->string')
class DistCacheTableExample(object):
    def __init__(self):
        self.records = list(get_cache_table('udf_test'))
        self.counter = 0
        self.ln = len(self.records)

    def evaluate(self):
        if self.counter > self.ln - 1:
            return None
        ret = self.records[self.counter]
        self.counter += 1
        return str(ret)

Call a UDF

Enable Python 3

MaxCompute projects use Python 2 for UDFs by default. To run a Python 3 UDF, add the following line before your SQL statement:

SET odps.sql.python.version=cp37;

Call within the same project

Call the UDF the same way as a built-in function:

SET odps.sql.python.version=cp37;
SELECT my_udf(column1, column2) FROM my_table;

Call across projects

To use a UDF from another project (for example, to use a UDF from Project B in Project A), prefix the function call with the source project name:

SELECT B:udf_in_other_project(arg0, arg1) AS res FROM table_t;

For more information, see Access resources across projects using packages.

Migrate Python 2 UDFs

Python 2 reached end of life (EOL) in early 2020.

For new projects, write all Python UDFs in Python 3.

For existing projects with Python 2 UDFs, proceed with caution when switching to Python 3. Two approaches are available:

  • Write new UDFs in Python 3 and enable Python 3 at the session level for new jobs. For details, see Enable Python 3.

  • Rewrite existing Python 2 UDFs to be compatible with both Python 2 and Python 3. For guidance, see Porting Python 2 code to Python 3.

For UDFs shared across multiple projects, write code that is compatible with both Python 2 and Python 3.

What's next