This topic describes how to write a user-defined function (UDF) in Java.

UDF code structure

You can use IntelliJ IDEA (Maven) or MaxCompute Studio to write a UDF in Java. The UDF code can contain the following information:
  • Java package: optional.

    You can package Java classes that are defined into a JAR file. This helps you find and use these classes.

  • Base UDF class: required.

    The required UDF class is com.aliyun.odps.udf.UDF. If you want to use other UDF classes or complex data types, follow the instructions provided in MaxCompute SDK to add the required classes. For example, the UDF class that corresponds to the STRUCT data type is com.aliyun.odps.data.Struct.

  • @Resolve annotation: optional.

    The annotation is in the @Resolve(<signature>) format. The signature parameter is used to define the data types of the input parameters and return value of a UDF. If you want to use the STRUCT data type for a UDF, you cannot use the reflection feature for the com.aliyun.odps.data.Struct class to obtain the names and types of fields. In this case, you must add the @Resolve annotation to the com.aliyun.odps.data.Struct class. This annotation affects only the overloading of the UDF whose input parameters or return value contain the com.aliyun.odps.data.Struct class. Example: @Resolve("struct<a:string>,string->string"). For more information about how to use complex data types in Java UDFs, see Use complex data types in Java UDFs.

  • Custom Java class: required.

    A custom Python class is the organizational unit of UDF code. This class defines the variables and methods that are used to meet your business requirements.

  • evaluate method: required.

    The evaluate method is a non-static public method and is contained in a custom Java class. The data types of the input parameters and return value of the evaluate method are used as the function signature of a UDF in SQL statements. The function signature defines the data types of the input parameters and return value of the UDF.

    You can implement multiple evaluate methods in a UDF. When you call a UDF, MaxCompute matches an evaluate method based on the data types of the UDF input parameters.

    When you write a Java UDF, you can use Java data types or Java writable data types. For more information about the mappings between the data types supported in MaxCompute projects, Java data types, and Java writable data types, see Data types.

  • UDF initialization or termination code: optional. You can use the void setup(ExecutionContext ctx) method to initialize a UDF and use the void close() method to terminate a UDF. The void setup(ExecutionContext ctx) method is called before the evaluate method is called. The void setup(ExecutionContext ctx) method is called only once and is used to initialize the resources that are required for data computing or initialize the members of a class. The void close() method is called after the evaluate method is called. The void close() method is used to clean up data, such as closing files.
Sample UDF code:
  • Use Java data types
    // Package the defined Java classes into a file named org.alidata.odps.udf.examples. 
    package org.alidata.odps.udf.examples;  
    // Inherit the UDF class. 
    import com.aliyun.odps.udf.UDF;         
    // The custom Java class. 
    public final class Lower extends UDF { 
    // The evaluate method. String indicates the data types of the input parameters and return indicates the return value. 
        public String evaluate(String s) { 
            if (s == null) { 
            return null; 
        } 
            return s.toLowerCase(); 
      } 
    }
  • Use Java writable data types
    // Package the defined Java classes into a file named com.aliyun.odps.udf.example. 
    package com.aliyun.odps.udf.example;
    // Add the class that corresponds to a Java writable data type. 
    import com.aliyun.odps.io.Text;
    // Inherit the UDF class. 
    import com.aliyun.odps.udf.UDF;
    // The custom Java class. 
    public class MyConcat extends UDF {
      private Text ret = new Text();
    // The evaluate method. Text indicates the data types of the input parameters and return indicates the return value. 
      public Text evaluate(Text a, Text b) {
          if (a == null || b == null) {
          return null;
        }
          ret.clear();
          ret.append(a.getBytes(), 0, a.getLength());
          ret.append(b.getBytes(), 0, b.getLength());
          return ret;
      }
    }

MaxCompute also allows you to use Hive UDFs whose Hive version is compatible with MaxCompute. For more information, see Hive UDFs.

Limits

You cannot access the Internet by using UDFs. If you want to access the Internet by using UDFs, fill in the network connection application form based on your business requirements and submit the application. After the application is approved, the MaxCompute technical support team will contact you and help you establish network connections. For more information about how to fill in the network connection application form, see Network connection process.

Usage notes

Before you write a Java UDF, take note of the following points:
  • We recommend that the JAR files of different UDFs do not contain the classes that have the same name but different logic. For example, the JAR file of UDF 1 is named udf1.jar and the JAR file of UDF 2 is named udf2.jar. Both files contain a class named com.aliyun.UserFunction.class, but the class has different logic in the files. If UDF 1 and UDF 2 are called in the same SQL statement, MaxCompute loads the com.aliyun.UserFunction.class from one of the two files. As result, the UDFs cannot run as expected and a compilation error may occur.
  • The data types of the input parameters or return value of a Java UDF are objects. The first letters of these data types must be capitalized, such as String.
  • NULL values in MaxCompute SQL are represented by NULL in Java. NULL values in MaxCompute SQL cannot be represented by values of primitive data types in Java and cannot be used.

Data types

In MaxCompute, different data type editions support different data types. In MaxCompute V2.0 and later, more data types and complex data types, such as ARRAY, MAP, and STRUCT, are supported. For more information about MaxCompute data type editions, see Data type editions.

The following table describes the mappings between the data types supported in MaxCompute projects, Java data types, and Java writable data types. You must write Java UDFs based on the mappings to ensure the consistency of the data types.

MaxCompute data type Java data type Java writable data type
TINYINT java.lang.Byte ByteWritable
SMALLINT java.lang.Short ShortWritable
INT java.lang.Integer IntWritable
BIGINT java.lang.Long LongWritable
FLOAT java.lang.Float FloatWritable
DOUBLE java.lang.Double DoubleWritable
DECIMAL java.math.BigDecimal BigDecimalWritable
BOOLEAN java.lang.Boolean BooleanWritable
STRING java.lang.String Text
VARCHAR com.aliyun.odps.data.Varchar VarcharWritable
BINARY com.aliyun.odps.data.Binary BytesWritable
DATETIME java.util.Date DatetimeWritable
TIMESTAMP java.sql.Timestamp TimestampWritable
INTERVAL_YEAR_MONTH N/A IntervalYearMonthWritable
INTERVAL_DAY_TIME N/A IntervalDayTimeWritable
ARRAY java.util.List N/A
MAP java.util.Map N/A
STRUCT com.aliyun.odps.data.Struct N/A

For more information about how to use complex data types in Java UDFs, see Use complex data types in Java UDFs.

Note The input parameters or return value of a UDF can be of Java writable data types only if your MaxCompute project uses the MaxCompute V2.0 data type edition.

Instructions

After you develop a Java UDF, you can use MaxCompute SQL to call this UDF. For more information about how to develop a Java UDF, see Development process. The following steps describe how to call the UDF:
  • Use a UDF in a MaxCompute project: The method is similar to that of using built-in functions.
  • Use a UDF across projects: Use a UDF of Project B in Project A. The following statement shows an example: select B:udf_in_other_project(arg0, arg1) as res from table_t;. For more information about resource sharing across projects, see Package-based resource sharing across projects.

For more information about how to use MaxCompute Studio to develop and call a Java UDF, see Example.

Hive UDFs

If your MaxCompute project uses the MaxCompute V2.0 data type edition and supports Hive UDFs, you can directly use Hive UDFs whose Hive version is compatible with MaxCompute.

The Hive version that is compatible with MaxCompute is 2.1.0, which corresponds to Hadoop 2.7.2. If a UDF is developed on another Hive or Hadoop version, you must use Hive 2.1.0 or Hadoop 2.7.2 to recompile the JAR file of the UDF.

For more information about how to use Hive UDFs in MaxCompute, see Write a Hive UDF in Java.

Example

This example demonstrates how to use MaxCompute Studio to develop and call a Java UDF that is used to convert all letters to lowercase letters.

  1. Make the following preparations on IntelliJ IDEA:
    1. Install MaxCompute Studio.
    2. Establish a connection to a MaxCompute project.
    3. Create a MaxCompute Java module.
  2. Write UDF code.
    1. In the left-side navigation pane of the Project tab, choose src > main > java, right-click java, and then choose New > MaxCompute Java. Create a Java class
    2. In the Create new MaxCompute java class dialog box, click UDF, enter a class name in the Name field, and then press Enter. The class is named Lower in this example. Create a MaxCompute java class

      Name: the name of the MaxCompute Java class. If you have not created a package, enter packagename.classname. The system automatically generates a package.

    3. Write code in the code editor. Code editorSample UDF code:
      package <packagename>;
      import com.aliyun.odps.udf.UDF;
      public final class Lower extends UDF {
          public String evaluate(String s) {
              if (s == null) { 
                 return null; 
              }
                 return s.toLowerCase();
          }
      }
      Note Debug the Java UDF on your on-premises machine if necessary. For more information, see Develop and debug UDFs.
  3. Create the Java UDF.
    Right-click the JAR file of the UDF and select Deploy to server.... In the Package a jar, submit resource and register function dialog box, configure the parameters and click OK. Create the Java UDF
    • MaxCompute project: the name of the MaxCompute project to which the UDF belongs. Retain the default value, which indicates that the connection to the MaxCompute project is established when you write the UDF.
    • Resource file: the path of the resource file on which the UDF depends. Retain the default value.
    • Resource name: the name of the resource on which the UDF depends. Retain the default value.
    • Function name: the name of the UDF that you want to create. This name is used in the SQL statements that are used to call this UDF. Example: Lower_test.
  4. In the left-side navigation pane, click the Project Explore tab. Right-click the MaxCompute project to which the UDF belongs, select Open in Console, enter the SQL statement for calling the UDF, and then press Enter to execute the SQL statement. Call the UDFSample statement:
    select lower_test('ABC');
    Returned results:
    +-----+
    | _c0 |
    +-----+
    | abc |
    +-----+