All Products
Search
Document Center

MaxCompute:Python 3 user-defined table-valued function (UDTF)

Last Updated:Mar 26, 2026

MaxCompute supports Python 3 using CPython 3.7.3. Python 2 has reached End of Life (EOL), so write all new user-defined table-valued functions (UDTFs) in Python 3.

Enable Python 3

By default, MaxCompute projects use Python 2 for UDFs. To use Python 3, add the following session-level command before your SQL statement and submit them together:

set odps.sql.python.version=cp37;

UDTF code structure

Use MaxCompute Studio to write UDTF code in Python 3. A UDTF has four components:

Component Required Description
Module imports Required Must include from odps.udf import annotate and from odps.udf import BaseUDTF. To reference files or tables, also add from odps.distcache import get_cache_file or from odps.distcache import get_cache_table.
Function signature Optional Declared with @annotate(<signature>). Defines the data types of input parameters and return values. Without a signature, any input data type is accepted and all return values default to STRING.
Custom Python class Required A derived class of BaseUDTF. Defines the variables and methods for your business logic.
Class methods Required Implement the required methods described in the table below.

Class methods

Method Required When called Description
BaseUDTF.init() Optional Once, before the first record Initialization method. When overriding, call super(BaseUDTF, self).init() at the start. Use this to set up internal state that persists across records.
BaseUDTF.process([args, ...]) Required Once per SQL record Processes each input row. The parameters of the process function are the input parameters of the UDTF specified in SQL statements.
BaseUDTF.forward([args, ...]) Required Called by your code Outputs one row per call. The parameters in the forward method are the UDTF output parameters specified in SQL statements. Without a function signature, convert all values to STRING before calling forward.
BaseUDTF.close() Optional Once, before the last record Cleanup method. Use this to release resources when the UDTF terminates.

The following example shows a minimal UDTF that splits a comma-separated string into individual rows:

# Import the function signature module and the base class.
from odps.udf import annotate
from odps.udf import BaseUDTF

# Function signature: takes a STRING, returns a STRING.
@annotate('string -> string')

# Custom Python class derived from BaseUDTF.
class Explode(BaseUDTF):

    def process(self, arg):
        props = arg.split(',')
        for p in props:
            self.forward(p)
Python 2 UDTFs and Python 3 UDTFs run on different underlying Python versions. Write each UDTF according to the syntax and capabilities of the Python version it targets.

Limitations

Python 3 is not compatible with Python 2. A single SQL statement cannot mix Python 2 UDTFs and Python 3 UDTFs.

Migrate Python 2 UDTFs

Python 2 has reached EOL. Migrate your existing Python 2 UDTFs based on your project situation:

  • New project or first Python UDTF: Write all Python UDTFs in Python 3 from the start.

  • Existing project with many Python 2 UDTFs: Migrate gradually to avoid disruption. Choose one of the following approaches:

    • Write new UDTFs in Python 3 and enable Python 3 at the session level for jobs that use those new UDTFs. For details, see Enable Python 3.

    • Rewrite existing Python 2 UDTFs to be compatible with both Python 2 and Python 3. See Porting Python 2 Code to Python 3 for guidance.

If a UDTF is shared across multiple MaxCompute projects, make it compatible with both Python 2 and Python 3 to avoid breaking projects that still use Python 2.

Third-party libraries

NumPy is not included in the MaxCompute Python 3 runtime environment. To use NumPy in a UDTF, manually upload a NumPy wheel package as a resource. The expected filename from Python Package Index (PyPI) or an image is:

numpy-<Version>-cp37-cp37m-manylinux1_x86_64.whl

For instructions on uploading the package, see Resource operations or Reference third-party packages in Python UDFs.

Function signatures and data types

A function signature declares the data types of a UDTF's input parameters and return values. MaxCompute validates the signature during semantics parsing and returns an error if the actual types do not match.

Signature format

@annotate('arg_type_list -> type_list')
  • arg_type_list: comma-separated input parameter types. Set to * to accept any number of parameters, or leave blank to accept no parameters.

  • type_list: return value types. A UDTF can return multiple columns.

Supported types for `type_list`: BIGINT, STRING, DOUBLE, BOOLEAN, DATETIME, DECIMAL, FLOAT, BINARY, DATE, DECIMAL(precision,scale), and complex types (ARRAY, MAP, STRUCT), including nested complex types.

Supported types for `arg_type_list`: all types listed above, plus CHAR and VARCHAR.

Select data types based on the data type edition of your MaxCompute project.

Signature examples

Signature Description
@annotate('bigint,boolean->string,datetime') Two input parameters (BIGINT, BOOLEAN); two return values (STRING, DATETIME).
@annotate('*->string,datetime') Any number of input parameters; two return values (STRING, DATETIME).
@annotate('->double,bigint,string') No input parameters; three return values (DOUBLE, BIGINT, STRING).
@annotate("array<string>,struct<a1:bigint,b1:string>,string->map<string,bigint>,struct<b1:bigint>") Complex type inputs and outputs.

Data type mappings

Write Python UDTFs using the Python types that correspond to MaxCompute SQL types:

MaxCompute SQL type Python 3 type
BIGINT INT
STRING UNICODE
DOUBLE FLOAT
BOOLEAN BOOL
DATETIME DATETIME.DATETIME
FLOAT FLOAT
CHAR UNICODE
VARCHAR UNICODE
BINARY BYTES
DATE DATETIME.DATE
DECIMAL DECIMAL.DECIMAL
ARRAY LIST
MAP DICT
STRUCT COLLECTIONS.NAMEDTUPLE

Reference resources

Reference files and tables in a Python UDTF using the odps.distcache module.

`odps.distcache.get_cache_file(resource_name)`

Returns the content of a file resource.

  • resource_name: the name of an existing file resource in your MaxCompute project. Returns an error if the name is invalid or the file does not exist.

  • Returns a file-like object. Call close() on the object when done to release the file handle.

  • Declare the file resource when creating the UDTF. If you omit this declaration, calling the UDTF returns an error.

`odps.distcache.get_cache_table(resource_name)`

Returns the content of a table resource.

  • resource_name: the name of an existing table resource in your MaxCompute project. Returns an error if the name is invalid or the table does not exist.

  • Returns a generator. Iterating over it yields one record per row, where each record is an ARRAY.

The following example reads data from a JSON file and a table resource, then outputs rows based on a lookup:

from odps.udf import annotate
from odps.udf import BaseUDTF
from odps.distcache import get_cache_file
from odps.distcache import get_cache_table

@annotate('string -> string, bigint')
class UDTFExample(BaseUDTF):

    def __init__(self):
        import json
        # Load the JSON file resource into a dict.
        cache_file = get_cache_file('test_json.txt')
        self.my_dict = json.load(cache_file)
        cache_file.close()

        # Append records from the table resource into the dict.
        records = list(get_cache_table('table_resource1'))
        for record in records:
            self.my_dict[record[0]] = record[1]

    def process(self, pageid):
        # For each input pageid, forward all associated adid values.
        for adid in self.my_dict[pageid]:
            self.forward(pageid, adid)

Call a Python 3 UDTF

After developing a Python 3 UDTF following the development process, call it from MaxCompute SQL.

  • Within a project: Call the UDTF the same way as a built-in function.

  • Across projects: Reference a UDTF from another project using the project name as a prefix:

    SELECT B:udf_in_other_project(arg0, arg1) AS res FROM table_t;

    For cross-project resource sharing setup, see Package-based resource sharing across projects.

To develop and test a Python 3 UDTF in MaxCompute Studio, see Develop a Python UDF.