Use a Hive user-defined function (UDF) that is compatible with MaxCompute's Hive version directly on the MaxCompute client — no rewrite required.
Prerequisites
Before you begin, ensure that you have:
-
The MaxCompute client installed and configured. See Install and configure the MaxCompute client
Sample code
The following Java class Collect extends GenericUDF to collect multiple arguments of the same data type into an array. Use this as the basis for your Hive UDF.
package com.aliyun.odps.compiler.hive;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import java.util.ArrayList;
import java.util.List;
import java.util.Objects;
public class Collect extends GenericUDF {
@Override
public ObjectInspector initialize(ObjectInspector[] objectInspectors) throws UDFArgumentException {
if (objectInspectors.length == 0) {
throw new UDFArgumentException("Collect: input args should >= 1");
}
for (int i = 1; i < objectInspectors.length; i++) {
if (objectInspectors[i] != objectInspectors[0]) {
throw new UDFArgumentException("Collect: input oi should be the same for all args");
}
}
return ObjectInspectorFactory.getStandardListObjectInspector(objectInspectors[0]);
}
@Override
public Object evaluate(DeferredObject[] deferredObjects) throws HiveException {
List<Object> objectList = new ArrayList<>(deferredObjects.length);
for (DeferredObject deferredObject : deferredObjects) {
objectList.add(deferredObject.get());
}
return objectList;
}
@Override
public String getDisplayString(String[] strings) {
return "Collect";
}
}
In this example, the JAR file compiled from this class is named test.jar.
Register and call the UDF
-
Package the sample code into a JAR file using Hive, then add it as a MaxCompute resource:
-- Add the JAR file as a MaxCompute resource. add jar test.jar;ImportantSpecify all JAR files explicitly in the
ADD JARcommand. MaxCompute does not automatically add JAR files to the classpath.For details on adding resources, see Add resources.
-
Create a function that maps to your UDF class:
-- Create a UDF. create function hive_collect as 'com.aliyun.odps.compiler.hive.Collect' using 'test.jar';For details, see Create a UDF.
-
Enable Hive compatibility mode and call the UDF. Submit both commands together in a single execution:
NoteAdd
set odps.sql.hive.compatible=true;before any SQL statement that calls a Hive UDF, and submit both lines together.-- Enable the Hive-compatible data type edition for the MaxCompute project. set odps.sql.hive.compatible=true; -- Call the UDF. select hive_collect(4y, 5y, 6y);The expected output:
+------+ | _c0 | +------+ | [4, 5, 6] | +------+
Usage notes
-
Java sandbox: Hive UDFs running in a distributed environment are subject to the MaxCompute Java sandbox. See Java sandbox.
-
Process startup timeout: Each UDF call starts a new process. If cluster resources are insufficient, there is a small probability that the UDF fails to run due to a process startup timeout.