This topic describes how to use the MaxCompute client to write a Hive user-defined function (UDF) in Java. A Hive version that is compatible with MaxCompute is used.

Prerequisites

The MaxCompute client is installed. For more information, see Install and configure the MaxCompute client.

Usage notes

Before you use a Hive UDF, take note of the following points:

  • When you run the ADD JAR command to add resource packages for a Hive UDF on the MaxCompute client, you must specify all JAR packages that you want to add. This is because MaxCompute cannot automatically add all JAR packages to the classpath.
  • To call a Hive UDF, you must add the set odps.sql.hive.compatible=true; command before the SQL statement that you want to execute. Then, commit and execute the command and SQL statement together.
  • If a Java UDF is running in a distributed environment, the UDF is limited by the Java sandbox for MaxCompute. For more information, see Java sandbox.

Sample code

Sample code:
package com.aliyun.odps.compiler.hive;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import java.util.ArrayList;
import java.util.List;
import java.util.Objects;
public class Collect extends GenericUDF {
  @Override
  public ObjectInspector initialize(ObjectInspector[] objectInspectors) throws UDFArgumentException {
    if (objectInspectors.length == 0) {
      throw new UDFArgumentException("Collect: input args should >= 1");
    }
    for (int i = 1; i < objectInspectors.length; i++) {
      if (objectInspectors[i] != objectInspectors[0]) {
        throw new UDFArgumentException("Collect: input oi should be the same for all args");
      }
    }
    return ObjectInspectorFactory.getStandardListObjectInspector(objectInspectors[0]);
  }
  @Override
  public Object evaluate(DeferredObject[] deferredObjects) throws HiveException {
    List<Object> objectList = new ArrayList<>(deferredObjects.length);
    for (DeferredObject deferredObject : deferredObjects) {
      objectList.add(deferredObject.get());
    }
    return objectList;
  }
  @Override
  public String getDisplayString(String[] strings) {
    return "Collect";
  }
}

The preceding code packages a random number of parameters that are of any data type into an array. The JAR file of the Hive UDF is named test.jar in this example.

Procedure

  1. Start the MaxCompute client.
  2. Package the code in Sample code into a JAR file by using the Hive platform. Then, run the following command to add the JAR file as a MaxCompute resource.
    -- Add the JAR file as a MaxCompute resource. 
    add jar test.jar;
    For more information about how to add resources, see Add resources.
  3. Run the following command to register the UDF:
    -- Register the UDF. 
    create function hive_collect as 'com.aliyun.odps.compiler.hive.Collect' using 'test.jar';
    For more information about how to register UDFs, see Create a function.
  4. Run the following command and SQL statement to call the UDF:
    -- Set odps.sql.hive.compatible to true to ensure the compatibility with Hive. 
    set odps.sql.hive.compatible=true;
    -- Call the UDF. 
    select hive_collect(4y, 5y, 6y);
    Returned result:
    +------+
    | _c0  |
    +------+
    | [4, 5, 6] |
    +------+