All Products
Search
Document Center

MaxCompute:UDAF overview

Last Updated:Mar 25, 2026

A user-defined aggregate function (UDAF) maps multiple input records to a single output value. Use UDAFs when no built-in aggregate function covers your aggregation logic. MaxCompute supports UDAFs written in Java or Python.

When to use a UDAF

Built-in functions are optimized for distributed processing and outperform UDAFs. Use a built-in function whenever one covers your use case.

Write a UDAF when your aggregation logic cannot be expressed with built-in functions — for example, custom statistical models, domain-specific scoring, or multi-field weighted aggregations.

UDAF types

TypeLanguageRuntime
Java UDAFJavaJVM
Python 2 UDAFPythonPython 2.7
Python 3 UDAFPythonCPython 3.7.3

For implementation details, see Java UDAFs, Python 2 UDAF, and Python 3 UDAF.

Development process

Java UDAF

The following figure shows the Java UDAF development workflow.

Write a UDF in Java
StepRequiredDescriptionPlatformReferences
1NoAdd the odps-sdk-udf SDK dependency to your POM file. Example: <dependency><groupId>com.aliyun.odps</groupId><artifactId>odps-sdk-udf</artifactId><version>0.29.10-public</version></dependency>. Search for the latest version on Maven repositories.IntelliJ IDEA (Maven)
2YesWrite the UDAF.IntelliJ IDEA (Maven) and MaxCompute StudioDevelop a UDF in Java
3YesDebug the UDAF on your local machine or with unit tests.
4YesPackage the code into a JAR file.
5YesUpload the JAR file as a resource to your MaxCompute project.MaxCompute client (odpscmd), MaxCompute Studio, DataWorksMaxCompute client: Add resources, Create a UDF; MaxCompute Studio: Package a Java program, upload the package, and create a MaxCompute UDF; DataWorks: Create and use MaxCompute resources, Create and use a MaxCompute function
6YesCreate a UDF based on the uploaded JAR file.
7NoCall the UDAF in a query.

Python UDAF

The following figure shows the Python UDAF development workflow.

Write a UDF in Python
StepRequiredDescriptionPlatformReferences
1YesWrite the UDAF.MaxCompute StudioDevelop a Python UDF
2YesDebug the UDAF on your local machine or with unit tests.
3YesUpload Python files and any required resources (file resources, table resources, third-party packages) to your MaxCompute project.MaxCompute client (odpscmd), MaxCompute Studio, DataWorksMaxCompute client: Add resources, Create a UDF; MaxCompute Studio: Upload a Python program and create a MaxCompute UDF; DataWorks: Use resources to register functions
4YesCreate a UDF based on the uploaded Python files or required resources.
5NoCall the UDAF in a query.

Limits

UDAFs cannot access the Internet. To enable Internet access, submit a network connection application. After approval, the MaxCompute technical support team will help you set up the connection. For details, see Network connection process.

Usage notes

  • Performance: Built-in functions outperform UDAFs. Use a built-in function whenever one covers your use case.

  • Memory: If a SQL statement using a UDAF exceeds the default memory allocation due to large data volumes or data skew, run the following command at the session level to increase the limit:

    set odps.sql.udf.joiner.jvm.memory=xxxx;

    For more information, see FAQ about MaxCompute UDFs.

  • Name collision: If a UDAF shares a name with a built-in function, MaxCompute calls the UDAF. To call the built-in function instead, prefix it with ::, for example: select ::concat('ab', 'c');

Call a UDAF

Within the same project: Call the UDAF the same way you call a built-in function.

Across projects: Use the project name as a prefix. The following example calls udf_in_other_project from project B inside project A:

select B:udf_in_other_project(arg0, arg1) as res from table_t;

For cross-project resource sharing setup, see Cross-project resource access based on packages.