This topic describes how to use the MaxCompute client to write a Hive user-defined function (UDF) in Java. A Hive version that is compatible with MaxCompute is used.
Prerequisites
The MaxCompute client is installed. For more information, see Install and configure the MaxCompute client.
Usage notes
Before you use a Hive UDF, take note of the following points:
- When you run the
ADD JAR
command to add resource packages for a Hive UDF on the MaxCompute client, you must specify all JAR packages that you want to add. This is because MaxCompute cannot automatically add all JAR packages to the classpath. - To call a Hive UDF, you must add the
set odps.sql.hive.compatible=true;
command before the SQL statement that you want to execute. Then, commit and execute the command and SQL statement together. - If a Java UDF is running in a distributed environment, the UDF is limited by the Java sandbox for MaxCompute. For more information, see Java sandbox.
Sample code
Sample code:
package com.aliyun.odps.compiler.hive;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import java.util.ArrayList;
import java.util.List;
import java.util.Objects;
public class Collect extends GenericUDF {
@Override
public ObjectInspector initialize(ObjectInspector[] objectInspectors) throws UDFArgumentException {
if (objectInspectors.length == 0) {
throw new UDFArgumentException("Collect: input args should >= 1");
}
for (int i = 1; i < objectInspectors.length; i++) {
if (objectInspectors[i] != objectInspectors[0]) {
throw new UDFArgumentException("Collect: input oi should be the same for all args");
}
}
return ObjectInspectorFactory.getStandardListObjectInspector(objectInspectors[0]);
}
@Override
public Object evaluate(DeferredObject[] deferredObjects) throws HiveException {
List<Object> objectList = new ArrayList<>(deferredObjects.length);
for (DeferredObject deferredObject : deferredObjects) {
objectList.add(deferredObject.get());
}
return objectList;
}
@Override
public String getDisplayString(String[] strings) {
return "Collect";
}
}
The preceding code packages a random number of parameters that are of any data type into an array. The JAR file of the Hive UDF is named test.jar in this example.