User-defined function development-UDTFs - MaxCompute

User-defined table-valued functions (UDTFs) let you extend MaxCompute's built-in capabilities by writing custom functions in Java or Python. A UDTF reads one row of input and returns multiple rows — the output is treated as a table.

MaxCompute includes some built-in UDTFs, such as EXPLODE. For more information, see Other functions and Complex type functions.

When to use UDTFs

Use a UDTF when built-in functions cannot express your logic — for example, when you need to emit a variable number of output rows per input row, or when your transformation depends on state accumulated across rows in a partition.

Built-in functions are optimized for MaxCompute's distributed engine and outperform UDTFs at scale. Use built-ins whenever they cover your use case.

Supported languages

Language	Version
Java	—
Python 2	2.7
Python 3	CPython 3.7.3

For implementation details, see Java UDTFs, Python 2 UDTF, and Python 3 UDTFs.

Limitations

Internet access: UDFs cannot access the Internet by default. To enable Internet access, fill in the network connection application form. After approval, the MaxCompute technical support team will contact you to establish the connection. For details, see Network connection process.
No other columns in the same `SELECT`: A SELECT statement that calls a UDTF cannot reference other columns or expressions. The following statement is invalid:
```
-- Invalid: mixes a UDTF with another column
select value, user_udtf(key) as mycol ...
```

No nesting: UDTFs cannot be nested inside other UDTFs. The following statement is invalid:

-- Invalid: user_udtf2 is nested inside user_udtf1
select user_udtf1(user_udtf2(key)) as mycol...;

Incompatible with `GROUP BY`, `DISTRIBUTE BY`, and `SORT BY`: A UDTF cannot appear in the same SELECT statement as these clauses. The following statement is invalid:
```
-- Invalid: UDTF used with GROUP BY
select user_udtf(key) as mycol ... group by mycol;
```

Usage notes

Memory: If a UDF job processes large volumes of data with data skew, the job may exceed the default JVM memory allocation. Run the following command at the session level to increase memory:

set odps.sql.udf.joiner.jvm.memory=xxxx;

For a full list of UDF FAQ topics, see FAQ about MaxCompute UDFs.

Name conflicts: If a UDF shares its name with a built-in function, MaxCompute calls the UDF. To explicitly call the built-in function, prefix it with :::

select ::concat('ab', 'c');

Development process

The steps differ slightly between Java and Python.

Java

Step	Required	Description	Tool
1	No	Add the Maven SDK dependency to your POM file. Search for `odps-sdk-udf` in Maven repositories to get the latest version. Example snippet: `<dependency><groupId>com.aliyun.odps</groupId><artifactId>odps-sdk-udf</artifactId><version>0.29.10-public</version></dependency>`	IntelliJ IDEA (Maven)
2	Yes	Write the UDTF code.	IntelliJ IDEA (Maven) or MaxCompute Studio
3	Yes	Debug the UDTF on your on-premises machine or through unit testing.	—
4	Yes	Package the UDTF into a JAR file.	—
5	Yes	Upload the JAR file as a resource to your MaxCompute project.	MaxCompute client, MaxCompute Studio, or DataWorks
6	Yes	Register the UDTF using the uploaded JAR file.	—
7	No	Call the UDTF in a SQL query.	—

For step 2, see Develop a UDF in Java. For step 5, see Add resources and Create a function. To package and register in one flow using MaxCompute Studio, see Package a Java program, upload the package, and create a MaxCompute UDF.

Python

Step	Required	Description	Tool
1	Yes	Write the UDTF code.	MaxCompute Studio
2	Yes	Debug the UDTF on your on-premises machine or through unit testing.	—
3	Yes	Upload Python files or required resources (file resources, table resources, third-party packages) to your MaxCompute project.	MaxCompute client, MaxCompute Studio, or DataWorks
4	Yes	Register the UDTF based on the uploaded files.	—
5	No	Call the UDTF in a SQL query.	—

For step 1, see Develop a Python UDF. For step 3, see Add resources and Create a function. To upload and register in one flow using MaxCompute Studio, see Upload a Python program and create a MaxCompute UDF.

Call a UDTF

Within the same project: Call a UDTF the same way you call a built-in function.

Across projects: To call a UDTF from project B inside project A, use the following syntax:

select B:udf_in_other_project(arg0, arg1) as res from table_t;

For more information about cross-project resource sharing, see Package-based resource sharing across projects.

MaxCompute:UDTF overview