Summary

Last Updated: May 09, 2018

MaxCompute provides many built-in functions to meet the computing requests of a user and a user can also create user-defined functions to meet different computing needs. A User Defined Function (UDF) is similar to an ordinary Built-in Function.

You can use Maven to search “odps-sdk-udf” from Maven to get different versions of Java SDK. The configuration information is shown as follows:

  1. <dependency>
  2. <groupId>com.aliyun.odps</groupId>
  3. <artifactId>odps-sdk-udf</artifactId>
  4. <version>0.20.7-public</version>
  5. </dependency>

In MaxCompute, you can expand two kinds of UDF:

UDF Class Description
UDF (User Defined Scalar Function) The relationship between input and output is a one-to-one relationship. Read a row data and write an output value.
UDTF (User Defined Table Valued Function) User Defined Table Valued Function (UDTF) is used to solve calling one function to output multiple rows of data. It is a unique defined function which can return multiple fields, while UDF can only output a return value.
UDAF(User Defined Aggregation Function) User Defined Aggregation Function (UDAF), the relationship between its input and output is one-to-many relationships. That is to aggregate multiple input records to an output value. It can be used with ‘Group By’ clause together. For more information, see Aggregation Functions.

Notes:

  • UDF stands for the set of use-defined functions, including User Defined Scalar Function, User Defined Aggregation Function and User Defined Table Valued Function. In a narrower sense, it represents user User Defined Scalar Function. The document uses this term frequently and the readers can judge the specific meaning according to the context.
  • If the system prompts that memory is not enough with an UDF involved in the SQL statement, configure set odps.sql.udf.joiner.jvm.memory=xxxx; to solve it. The reason is that the amount of data is too large and there is a data skew, so that the memory size occupied by task exceeds the default memory size.

Parameter and Return Value Type

The data types of UDF supported by MaxCompute SQL include: bigint, string, double, Boolean, and datetime.

MaxCompute UDF supports cross-project sharing. A UDF in project_b can be used in project_a. For example, you can write a SQL statement into sql select other_project:udf_in_other_project(arg0, arg1) as res from table_t;. For more information about authorization, see Authorization in Security Guide documentation.

UDF Example

Please see UDF Example in Quick Start Volume.

Thank you! We've received your feedback.