A user-defined aggregate function (UDAF) maps multiple input records to a single output value. Use UDAFs when no built-in aggregate function covers your aggregation logic. MaxCompute supports UDAFs written in Java or Python.
When to use a UDAF
Built-in functions are optimized for distributed processing and outperform UDAFs. Use a built-in function whenever one covers your use case.
Write a UDAF when your aggregation logic cannot be expressed with built-in functions — for example, custom statistical models, domain-specific scoring, or multi-field weighted aggregations.
UDAF types
| Type | Language | Runtime |
|---|---|---|
| Java UDAF | Java | JVM |
| Python 2 UDAF | Python | Python 2.7 |
| Python 3 UDAF | Python | CPython 3.7.3 |
For implementation details, see Java UDAFs, Python 2 UDAF, and Python 3 UDAF.
Development process
Java UDAF
The following figure shows the Java UDAF development workflow.

| Step | Required | Description | Platform | References |
|---|---|---|---|---|
| 1 | No | Add the odps-sdk-udf SDK dependency to your POM file. Example: <dependency><groupId>com.aliyun.odps</groupId><artifactId>odps-sdk-udf</artifactId><version>0.29.10-public</version></dependency>. Search for the latest version on Maven repositories. | IntelliJ IDEA (Maven) | — |
| 2 | Yes | Write the UDAF. | IntelliJ IDEA (Maven) and MaxCompute Studio | Develop a UDF in Java |
| 3 | Yes | Debug the UDAF on your local machine or with unit tests. | — | — |
| 4 | Yes | Package the code into a JAR file. | — | — |
| 5 | Yes | Upload the JAR file as a resource to your MaxCompute project. | MaxCompute client (odpscmd), MaxCompute Studio, DataWorks | MaxCompute client: Add resources, Create a UDF; MaxCompute Studio: Package a Java program, upload the package, and create a MaxCompute UDF; DataWorks: Create and use MaxCompute resources, Create and use a MaxCompute function |
| 6 | Yes | Create a UDF based on the uploaded JAR file. | — | — |
| 7 | No | Call the UDAF in a query. | — | — |
Python UDAF
The following figure shows the Python UDAF development workflow.

| Step | Required | Description | Platform | References |
|---|---|---|---|---|
| 1 | Yes | Write the UDAF. | MaxCompute Studio | Develop a Python UDF |
| 2 | Yes | Debug the UDAF on your local machine or with unit tests. | — | — |
| 3 | Yes | Upload Python files and any required resources (file resources, table resources, third-party packages) to your MaxCompute project. | MaxCompute client (odpscmd), MaxCompute Studio, DataWorks | MaxCompute client: Add resources, Create a UDF; MaxCompute Studio: Upload a Python program and create a MaxCompute UDF; DataWorks: Use resources to register functions |
| 4 | Yes | Create a UDF based on the uploaded Python files or required resources. | — | — |
| 5 | No | Call the UDAF in a query. | — | — |
Limits
UDAFs cannot access the Internet. To enable Internet access, submit a network connection application. After approval, the MaxCompute technical support team will help you set up the connection. For details, see Network connection process.
Usage notes
Performance: Built-in functions outperform UDAFs. Use a built-in function whenever one covers your use case.
Memory: If a SQL statement using a UDAF exceeds the default memory allocation due to large data volumes or data skew, run the following command at the session level to increase the limit:
set odps.sql.udf.joiner.jvm.memory=xxxx;For more information, see FAQ about MaxCompute UDFs.
Name collision: If a UDAF shares a name with a built-in function, MaxCompute calls the UDAF. To call the built-in function instead, prefix it with
::, for example:select ::concat('ab', 'c');
Call a UDAF
Within the same project: Call the UDAF the same way you call a built-in function.
Across projects: Use the project name as a prefix. The following example calls udf_in_other_project from project B inside project A:
select B:udf_in_other_project(arg0, arg1) as res from table_t;For cross-project resource sharing setup, see Cross-project resource access based on packages.