This topic describes how to write a user-defined function (UDF) in Java.
UDF code structure
- Java package: optional.
You can package Java classes that are defined into a JAR file. This helps you find and use these classes.
- Base UDF class: required.
The required UDF class is
com.aliyun.odps.udf.UDF
. If you want to use other UDF classes or complex data types, follow the instructions provided in MaxCompute SDK to add the required classes. For example, the UDF class that corresponds to the STRUCT data type iscom.aliyun.odps.data.Struct
. @Resolve
annotation: optional.The annotation is in the
@Resolve(<signature>)
format. Thesignature
parameter is used to define the data types of the input parameters and return value of a UDF. If you want to use the STRUCT data type for a UDF, you cannot use the reflection feature for thecom.aliyun.odps.data.Struct
class to obtain the names and types of fields. In this case, you must add the@Resolve
annotation to the com.aliyun.odps.data.Struct class. Thisannotation
affects only the overloading of the UDF whose input parameters or return value contain the com.aliyun.odps.data.Struct class. Example:@Resolve("struct<a:string>,string->string")
. For more information about how to use complex data types in Java UDFs, see Use complex data types in Java UDFs.- Custom Java class: required.
A custom Python class is the organizational unit of UDF code. This class defines the variables and methods that are used to meet your business requirements.
evaluate
method: required.The evaluate method is a non-static public method and is contained in a custom Java class. The data types of the input parameters and return value of the
evaluate
method are used as the function signature of a UDF in SQL statements. The function signature defines the data types of the input parameters and return value of the UDF.You can implement multiple
evaluate
methods in a UDF. When you call a UDF, MaxCompute matches anevaluate
method based on the data types of the UDF input parameters.When you write a Java UDF, you can use Java data types or Java writable data types. For more information about the mappings between the data types supported in MaxCompute projects, Java data types, and Java writable data types, see Data types.
- UDF initialization or termination code: optional. You can use the
void setup(ExecutionContext ctx)
method to initialize a UDF and use thevoid close()
method to terminate a UDF. Thevoid setup(ExecutionContext ctx)
method is called before theevaluate
method is called. The void setup(ExecutionContext ctx) method is called only once and is used to initialize the resources that are required for data computing or initialize the members of a class. Thevoid close()
method is called after theevaluate
method is called. The void close() method is used to clean up data, such as closing files.
- Use Java data types
// Package the defined Java classes into a file named org.alidata.odps.udf.examples. package org.alidata.odps.udf.examples; // Inherit the UDF class. import com.aliyun.odps.udf.UDF; // The custom Java class. public final class Lower extends UDF { // The evaluate method. String indicates the data types of the input parameters and return indicates the return value. public String evaluate(String s) { if (s == null) { return null; } return s.toLowerCase(); } }
- Use Java writable data types
// Package the defined Java classes into a file named com.aliyun.odps.udf.example. package com.aliyun.odps.udf.example; // Add the class that corresponds to a Java writable data type. import com.aliyun.odps.io.Text; // Inherit the UDF class. import com.aliyun.odps.udf.UDF; // The custom Java class. public class MyConcat extends UDF { private Text ret = new Text(); // The evaluate method. Text indicates the data types of the input parameters and return indicates the return value. public Text evaluate(Text a, Text b) { if (a == null || b == null) { return null; } ret.clear(); ret.append(a.getBytes(), 0, a.getLength()); ret.append(b.getBytes(), 0, b.getLength()); return ret; } }
MaxCompute also allows you to use Hive UDFs whose Hive version is compatible with MaxCompute. For more information, see Hive UDFs.
Limits
You cannot access the Internet by using UDFs. If you want to access the Internet by using UDFs, fill in the network connection application from based on your business requirements and submit the application. After the application is approved, the MaxCompute technical support team will contact you and help you establish network connections. For more information about how to fill in the network connection application form, see Network connection process.
Usage notes
- We recommend that the JAR files of different UDFs do not contain the classes that have the same name but different logic. For example, the JAR file of UDF 1 is named udf1.jar and the JAR file of UDF 2 is named udf2.jar. Both files contain a class named
com.aliyun.UserFunction.class
, but the class has different logic in the files. If UDF 1 and UDF 2 are called in the same SQL statement, MaxCompute loads the com.aliyun.UserFunction.class from one of the two files. As result, the UDFs cannot run as expected and a compilation error may occur. - The data types of the input parameters or return value of a Java UDF are objects. The first letters of these data types must be capitalized, such as String.
- NULL values in MaxCompute SQL are represented by NULL in Java. NULL values in MaxCompute SQL cannot be represented by values of primitive data types in Java and cannot be used.
Data types
In MaxCompute, different data type editions support different data types. In MaxCompute V2.0 and later, more data types and complex data types, such as ARRAY, MAP, and STRUCT, are supported. For more information about MaxCompute data type editions, see Data type editions.
The following table describes the mappings between the data types supported in MaxCompute projects, Java data types, and Java writable data types. You must write Java UDFs based on the mappings to ensure the consistency of the data types.
MaxCompute data type | Java data type | Java writable data type |
---|---|---|
TINYINT | java.lang.Byte | ByteWritable |
SMALLINT | java.lang.Short | ShortWritable |
INT | java.lang.Integer | IntWritable |
BIGINT | java.lang.Long | LongWritable |
FLOAT | java.lang.Float | FloatWritable |
DOUBLE | java.lang.Double | DoubleWritable |
DECIMAL | java.math.BigDecimal | BigDecimalWritable |
BOOLEAN | java.lang.Boolean | BooleanWritable |
STRING | java.lang.String | Text |
VARCHAR | com.aliyun.odps.data.Varchar | VarcharWritable |
BINARY | com.aliyun.odps.data.Binary | BytesWritable |
DATE | java.sql.Date | DateWritable |
DATETIME | java.util.Date | DatetimeWritable |
TIMESTAMP | java.sql.Timestamp | TimestampWritable |
INTERVAL_YEAR_MONTH | N/A | IntervalYearMonthWritable |
INTERVAL_DAY_TIME | N/A | IntervalDayTimeWritable |
ARRAY | java.util.List | N/A |
MAP | java.util.Map | N/A |
STRUCT | com.aliyun.odps.data.Struct | N/A |
For more information about how to use complex data types in Java UDFs, see Use complex data types in Java UDFs.
Instructions
- Use a UDF in a MaxCompute project: The method is similar to that of using built-in functions.
- Use a UDF across projects: Use a UDF of Project B in Project A. The following statement shows an example:
select B:udf_in_other_project(arg0, arg1) as res from table_t;
. For more information about resource sharing across projects, see Cross-project resource access based on packages.
For more information about how to use MaxCompute Studio to develop and call a Java UDF, see Example.
Hive UDFs
If your MaxCompute project uses the MaxCompute V2.0 data type edition and supports Hive UDFs, you can directly use Hive UDFs whose Hive version is compatible with MaxCompute.
The Hive version that is compatible with MaxCompute is 2.1.0, which corresponds to Hadoop 2.7.2. If a UDF is developed on another Hive or Hadoop version, you must use Hive 2.1.0 or Hadoop 2.7.2 to recompile the JAR file of the UDF.
For more information about how to use Hive UDFs in MaxCompute, see Write a Hive UDF in Java.
Example
This example demonstrates how to use MaxCompute Studio to develop and call a Java UDF that is used to convert all letters to lowercase letters.
- Make the following preparations on IntelliJ IDEA:
- Write UDF code.
- In the left-side navigation pane of the Project tab, choose , right-click java, and then choose .
- In the Create new MaxCompute java class dialog box, click UDF, enter a class name in the Name field, and then press Enter. The class is named Lower in this example.
Name: the name of the MaxCompute Java class. If you have not created a package, enter packagename.classname. The system automatically generates a package.
- Write code in the code editor.
Sample UDF code:
package <packagename>; import com.aliyun.odps.udf.UDF; public final class Lower extends UDF { public String evaluate(String s) { if (s == null) { return null; } return s.toLowerCase(); } }
Note Debug the Java UDF on your on-premises machine if necessary. For more information, see Develop and debug UDFs.
- In the left-side navigation pane of the Project tab, choose , right-click java, and then choose .
- Create the Java UDF. Right-click the JAR file of the UDF and select Deploy to server.... In the Package a jar, submit resource and register function dialog box, configure the parameters and click OK.
- MaxCompute project: the name of the MaxCompute project to which the UDF belongs. Retain the default value, which indicates that the connection to the MaxCompute project is established when you write the UDF.
- Resource file: the path of the resource file on which the UDF depends. Retain the default value.
- Resource name: the name of the resource on which the UDF depends. Retain the default value.
- Function name: the name of the UDF that you want to create. This name is used in the SQL statements that are used to call this UDF. Example: Lower_test.
- In the left-side navigation pane, click the Project Explore tab. Right-click the MaxCompute project to which the UDF belongs, select Open in Console, enter the SQL statement for calling the UDF, and then press Enter to execute the SQL statement.
Sample statement:
Returned results:select lower_test('ABC');
+-----+ | _c0 | +-----+ | abc | +-----+