MaxCompute offers a variety of built-in functions to meet your computing requirements. You can also use user-defined functions (UDFs).

In the broad sense, UDFs include user-defined scalar functions (UDFs), user-defined aggregation functions (UDAFs), and user-defined table-valued functions (UDTFs). In the narrow sense, UDFs refer only to user-defined scalar functions. The term UDF is frequently used in Alibaba Cloud documentation. You can determine whether it is used in the narrow sense based on the context.

UDFs work in a way similar to built-in functions. For the mapping between Java and MaxCompute data types, see Java UDF.
Note If a UDF has the same name as a built-in function, MaxCompute executes the UDF by default. To call the built-in function, you must execute the select ::Function name (expression) ; statement. For example, if a CONCAT UDF and a CONCAT built-in function are available in MaxCompute, MaxCompute calls the UDF rather than the built-in function. To call the CONCAT built-in function, execute the select ::concat('ab', 'c') ; statement.

UDF types

The following table describes three types of UDFs supported by MaxCompute.
UDF type Description
UDF User-defined scalar function. The relationship between the input and output of a user-defined scalar function is one-to-one mapping. One value is returned each time a user-defined scalar function reads one row of data.
UDTF A user-defined table-valued function. It is used in scenarios where multiple rows of data are returned after you call each function. Only this type of function can return multiple fields. A UDTF is not equivalent to a user-defined type (UDT).
UDAF A user-defined aggregation function. The relationship between the input and output of a user-defined aggregation function is many-to-one mapping. Multiple input records are aggregated to generate one output value. It can be used with the GROUP BY clause of SQL. For more information, see Aggregate functions.
Note When you use a UDF in an SQL statement, the system may prompt insufficient memory. The reason is that the memory size required for the computing task exceeds the default memory size. In this case, you can execute the set odps.sql.udf.joiner.jvm.memory=xxxx; statement to adjust the memory size.

MaxCompute UDFs can be shared by multiple projects. UDFs in one project can be used in another project. The SQL statement for cross-project UDF sharing is select other_project:udf_in_other_project(arg0, arg1) as res from table_t;. For more information, see Authorize users.

UDF SDK dependency installation

If you use Maven, you can search for odps-sdk-udf in the Maven repository to obtain your required SDK for Java. The configuration information is as follows: