User-defined aggregate functions development-UDAF - MaxCompute

A user-defined aggregate function (UDAF) maps multiple input records to a single output value. Use UDAFs when no built-in aggregate function covers your aggregation logic. MaxCompute supports UDAFs written in Java or Python.

When to use a UDAF

Built-in functions are optimized for distributed processing and outperform UDAFs. Use a built-in function whenever one covers your use case.

Write a UDAF when your aggregation logic cannot be expressed with built-in functions — for example, custom statistical models, domain-specific scoring, or multi-field weighted aggregations.

UDAF types

Type	Language	Runtime
Java UDAF	Java	JVM
Python 2 UDAF	Python	Python 2.7
Python 3 UDAF	Python	CPython 3.7.3

For implementation details, see Java UDAFs, Python 2 UDAF, and Python 3 UDAF.

Development process

Java UDAF

The following figure shows the Java UDAF development workflow.

Step	Required	Description	Platform	References
1	No	Add the `odps-sdk-udf` SDK dependency to your POM file. Example: `<dependency><groupId>com.aliyun.odps</groupId><artifactId>odps-sdk-udf</artifactId><version>0.29.10-public</version></dependency>`. Search for the latest version on Maven repositories.	IntelliJ IDEA (Maven)	—
2	Yes	Write the UDAF.	IntelliJ IDEA (Maven) and MaxCompute Studio	Develop a UDF in Java
3	Yes	Debug the UDAF on your local machine or with unit tests.	—	—
4	Yes	Package the code into a JAR file.	—	—
5	Yes	Upload the JAR file as a resource to your MaxCompute project.	MaxCompute client (odpscmd), MaxCompute Studio, DataWorks	MaxCompute client: Add resources, Create a UDF; MaxCompute Studio: Package a Java program, upload the package, and create a MaxCompute UDF; DataWorks: Create and use MaxCompute resources, Create and use a MaxCompute function
6	Yes	Create a UDF based on the uploaded JAR file.	—	—
7	No	Call the UDAF in a query.	—	—

Python UDAF

The following figure shows the Python UDAF development workflow.

Step	Required	Description	Platform	References
1	Yes	Write the UDAF.	MaxCompute Studio	Develop a Python UDF
2	Yes	Debug the UDAF on your local machine or with unit tests.	—	—
3	Yes	Upload Python files and any required resources (file resources, table resources, third-party packages) to your MaxCompute project.	MaxCompute client (odpscmd), MaxCompute Studio, DataWorks	MaxCompute client: Add resources, Create a UDF; MaxCompute Studio: Upload a Python program and create a MaxCompute UDF; DataWorks: Use resources to register functions
4	Yes	Create a UDF based on the uploaded Python files or required resources.	—	—
5	No	Call the UDAF in a query.	—	—

Limits

UDAFs cannot access the Internet. To enable Internet access, submit a network connection application. After approval, the MaxCompute technical support team will help you set up the connection. For details, see Network connection process.

Usage notes

Performance: Built-in functions outperform UDAFs. Use a built-in function whenever one covers your use case.
Memory: If a SQL statement using a UDAF exceeds the default memory allocation due to large data volumes or data skew, run the following command at the session level to increase the limit:
```
set odps.sql.udf.joiner.jvm.memory=xxxx;
```
For more information, see FAQ about MaxCompute UDFs.
Name collision: If a UDAF shares a name with a built-in function, MaxCompute calls the UDAF. To call the built-in function instead, prefix it with ::, for example: select ::concat('ab', 'c');

Call a UDAF

Within the same project: Call the UDAF the same way you call a built-in function.

Across projects: Use the project name as a prefix. The following example calls udf_in_other_project from project B inside project A:

select B:udf_in_other_project(arg0, arg1) as res from table_t;

For cross-project resource sharing setup, see Cross-project resource access based on packages.