All Products
Search
Document Center

MaxCompute:Example: Use a Hive UDF whose Hive version is compatible with MaxCompute

Last Updated:Mar 26, 2026

Use a Hive user-defined function (UDF) that is compatible with MaxCompute's Hive version directly on the MaxCompute client — no rewrite required.

Prerequisites

Before you begin, ensure that you have:

Sample code

The following Java class Collect extends GenericUDF to collect multiple arguments of the same data type into an array. Use this as the basis for your Hive UDF.

package com.aliyun.odps.compiler.hive;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import java.util.ArrayList;
import java.util.List;
import java.util.Objects;
public class Collect extends GenericUDF {
  @Override
  public ObjectInspector initialize(ObjectInspector[] objectInspectors) throws UDFArgumentException {
    if (objectInspectors.length == 0) {
      throw new UDFArgumentException("Collect: input args should >= 1");
    }
    for (int i = 1; i < objectInspectors.length; i++) {
      if (objectInspectors[i] != objectInspectors[0]) {
        throw new UDFArgumentException("Collect: input oi should be the same for all args");
      }
    }
    return ObjectInspectorFactory.getStandardListObjectInspector(objectInspectors[0]);
  }
  @Override
  public Object evaluate(DeferredObject[] deferredObjects) throws HiveException {
    List<Object> objectList = new ArrayList<>(deferredObjects.length);
    for (DeferredObject deferredObject : deferredObjects) {
      objectList.add(deferredObject.get());
    }
    return objectList;
  }
  @Override
  public String getDisplayString(String[] strings) {
    return "Collect";
  }
}

In this example, the JAR file compiled from this class is named test.jar.

Register and call the UDF

  1. Install and start the MaxCompute client.

  2. Package the sample code into a JAR file using Hive, then add it as a MaxCompute resource:

    -- Add the JAR file as a MaxCompute resource.
    add jar test.jar;
    Important

    Specify all JAR files explicitly in the ADD JAR command. MaxCompute does not automatically add JAR files to the classpath.

    For details on adding resources, see Add resources.

  3. Create a function that maps to your UDF class:

    -- Create a UDF.
    create function hive_collect as 'com.aliyun.odps.compiler.hive.Collect' using 'test.jar';

    For details, see Create a UDF.

  4. Enable Hive compatibility mode and call the UDF. Submit both commands together in a single execution:

    Note

    Add set odps.sql.hive.compatible=true; before any SQL statement that calls a Hive UDF, and submit both lines together.

    -- Enable the Hive-compatible data type edition for the MaxCompute project.
    set odps.sql.hive.compatible=true;
    -- Call the UDF.
    select hive_collect(4y, 5y, 6y);

    The expected output:

    +------+
    | _c0  |
    +------+
    | [4, 5, 6] |
    +------+

Usage notes

  • Java sandbox: Hive UDFs running in a distributed environment are subject to the MaxCompute Java sandbox. See Java sandbox.

  • Process startup timeout: Each UDF call starts a new process. If cluster resources are insufficient, there is a small probability that the UDF fails to run due to a process startup timeout.