User-defined table-valued functions (UDTFs) let you extend MaxCompute's built-in capabilities by writing custom functions in Java or Python. A UDTF reads one row of input and returns multiple rows — the output is treated as a table.
MaxCompute includes some built-in UDTFs, such as EXPLODE. For more information, see Other functions and Complex type functions.
When to use UDTFs
Use a UDTF when built-in functions cannot express your logic — for example, when you need to emit a variable number of output rows per input row, or when your transformation depends on state accumulated across rows in a partition.
Built-in functions are optimized for MaxCompute's distributed engine and outperform UDTFs at scale. Use built-ins whenever they cover your use case.
Supported languages
| Language | Version |
|---|---|
| Java | — |
| Python 2 | 2.7 |
| Python 3 | CPython 3.7.3 |
For implementation details, see Java UDTFs, Python 2 UDTF, and Python 3 UDTFs.
Limitations
-
Internet access: UDFs cannot access the Internet by default. To enable Internet access, fill in the network connection application form. After approval, the MaxCompute technical support team will contact you to establish the connection. For details, see Network connection process.
-
No other columns in the same `SELECT`: A
SELECTstatement that calls a UDTF cannot reference other columns or expressions. The following statement is invalid:-- Invalid: mixes a UDTF with another column select value, user_udtf(key) as mycol ... -
No nesting: UDTFs cannot be nested inside other UDTFs. The following statement is invalid:
-- Invalid: user_udtf2 is nested inside user_udtf1 select user_udtf1(user_udtf2(key)) as mycol...; -
Incompatible with `GROUP BY`, `DISTRIBUTE BY`, and `SORT BY`: A UDTF cannot appear in the same
SELECTstatement as these clauses. The following statement is invalid:-- Invalid: UDTF used with GROUP BY select user_udtf(key) as mycol ... group by mycol;
Usage notes
Memory: If a UDF job processes large volumes of data with data skew, the job may exceed the default JVM memory allocation. Run the following command at the session level to increase memory:
set odps.sql.udf.joiner.jvm.memory=xxxx;
For a full list of UDF FAQ topics, see FAQ about MaxCompute UDFs.
Name conflicts: If a UDF shares its name with a built-in function, MaxCompute calls the UDF. To explicitly call the built-in function, prefix it with :::
select ::concat('ab', 'c');
Development process
The steps differ slightly between Java and Python.
Java
| Step | Required | Description | Tool |
|---|---|---|---|
| 1 | No | Add the Maven SDK dependency to your POM file. Search for odps-sdk-udf in Maven repositories to get the latest version. Example snippet: <dependency><groupId>com.aliyun.odps</groupId><artifactId>odps-sdk-udf</artifactId><version>0.29.10-public</version></dependency> |
IntelliJ IDEA (Maven) |
| 2 | Yes | Write the UDTF code. | IntelliJ IDEA (Maven) or MaxCompute Studio |
| 3 | Yes | Debug the UDTF on your on-premises machine or through unit testing. | — |
| 4 | Yes | Package the UDTF into a JAR file. | — |
| 5 | Yes | Upload the JAR file as a resource to your MaxCompute project. | MaxCompute client, MaxCompute Studio, or DataWorks |
| 6 | Yes | Register the UDTF using the uploaded JAR file. | — |
| 7 | No | Call the UDTF in a SQL query. | — |
For step 2, see Develop a UDF in Java. For step 5, see Add resources and Create a function. To package and register in one flow using MaxCompute Studio, see Package a Java program, upload the package, and create a MaxCompute UDF.
Python
| Step | Required | Description | Tool |
|---|---|---|---|
| 1 | Yes | Write the UDTF code. | MaxCompute Studio |
| 2 | Yes | Debug the UDTF on your on-premises machine or through unit testing. | — |
| 3 | Yes | Upload Python files or required resources (file resources, table resources, third-party packages) to your MaxCompute project. | MaxCompute client, MaxCompute Studio, or DataWorks |
| 4 | Yes | Register the UDTF based on the uploaded files. | — |
| 5 | No | Call the UDTF in a SQL query. | — |
For step 1, see Develop a Python UDF. For step 3, see Add resources and Create a function. To upload and register in one flow using MaxCompute Studio, see Upload a Python program and create a MaxCompute UDF.
Call a UDTF
Within the same project: Call a UDTF the same way you call a built-in function.
Across projects: To call a UDTF from project B inside project A, use the following syntax:
select B:udf_in_other_project(arg0, arg1) as res from table_t;
For more information about cross-project resource sharing, see Package-based resource sharing across projects.