When you write a UDF that processes structured data, you need to map MaxCompute complex types — ARRAY, MAP, and STRUCT — to the corresponding Java or Python types in your handler code. This tutorial shows you how to implement, deploy, and call a UDF that converts timestamps across all three complex types.
Java UDFs support method overloading, so a single UDF name handles all three types. Python UDFs require a separate function for each type.
Prerequisites
Before you begin, ensure that you have:
-
A MaxCompute project
-
A Java development environment with Maven, or a Python 3 environment
Type mappings
Each MaxCompute complex data type maps to a specific Java or Python type in your UDF handler:
| MaxCompute type | Java type | Python type |
|---|---|---|
| ARRAY | java.util.List |
list |
| MAP | java.util.Map |
dict |
| STRUCT | com.aliyun.odps.data.Struct |
collections.namedtuple |
STRUCT requires the@Resolveannotation in Java. Because reflection cannot read field names and types fromcom.aliyun.odps.data.Struct, the annotation supplies that information at compile time. The annotation affects only overloaded methods whose input parameters or return value includesStruct.
Step 1: Write the UDF
Java UDF
All three evaluate methods in the Java class share the name UDF_COMPLEX_DATA. MaxCompute dispatches calls based on the argument type, so each method handles one complex type.
The UDF signatures in SQL map directly to the Java method signatures:
array<string> UDF_COMPLEX_DATA(array<bigint> as)
map<string, string> UDF_COMPLEX_DATA(map<string,bigint> ms)
struct<output_name:string,output_time:string> UDF_COMPLEX_DATA(struct<input_name:string,input_timestamp:bigint> st)
Add the following dependency to pom.xml:
<dependency>
<groupId>com.aliyun.odps</groupId>
<artifactId>odps-sdk-udf</artifactId>
<version>0.29.10-public</version>
</dependency>package com.aliyun; // Specify a package name.
import com.aliyun.odps.data.Struct;
import com.aliyun.odps.udf.UDF;
import com.aliyun.odps.udf.annotation.Resolve;
import java.text.SimpleDateFormat;
import java.util.*;
@Resolve("struct<input_name:string, input_timestamp:bigint>->map<string,string>")
public class ComplexDataTypeExample extends UDF{
private static final String PATTERN = "yyyy-MM-dd HH:mm:ss";
/**
* Convert a list of timestamps into a list of time strings.
* @param timestamps Enter a list of timestamps.
* @return Obtain a list of time strings.
*/
public List<String> evaluate(List<Long> timestamps) {
if (timestamps == null) {
return null;
}
List<String> result = new ArrayList<>();
SimpleDateFormat formatter = new SimpleDateFormat(PATTERN);
for (Long timestamp : timestamps) {
Date date = new Date(timestamp < 9999999999L ? timestamp * 1000 : timestamp);
String dateString = formatter.format(date);
result.add(dateString);
}
return result;
}
/**
* Convert timestamps of the MAP data type into time strings of the MAP data type.
* @param timestamps Enter data of the MAP data type in which values are timestamps.
* @return Obtain a list of time strings of the MAP data type.
*/
public Map<String, String> evaluate(Map<String, Long> timestamps) {
if (timestamps == null) {
return null;
}
Map<String, String> result = new HashMap<>(timestamps.size());
SimpleDateFormat formatter = new SimpleDateFormat(PATTERN);
for (String key : timestamps.keySet()) {
Long timestamp = timestamps.get(key);
Date date = new Date(timestamp < 9999999999L ? timestamp * 1000 : timestamp);
String dateString = formatter.format(date);
result.put(key, dateString);
}
return result;
}
/**
* Convert a timestamp into a time string.
* @param input Enter a timestamp of the STRUCT data type.
* @return Obtain a time string of the STRUCT data type.
*/
public Map<String, String> evaluate(Struct input) {
if (input == null) {
return null;
}
SimpleDateFormat formatter = new SimpleDateFormat(PATTERN);
String nameValue = (String) input.getFieldValue("input_name");
Long timestampValue = (Long) input.getFieldValue("input_timestamp");
Date date = new Date(timestampValue < 9999999999L ? timestampValue * 1000 : timestampValue);
String dateString = formatter.format(date);
Map<String, String> result = new HashMap<>(8);
result.put("output_name", nameValue);
result.put("output_time", dateString);
return result;
}
}
Each evaluate method returns null when its input is null. MaxCompute does not guarantee the evaluation order of SQL subexpressions, so null checks inside the UDF are the reliable way to handle null inputs.
For other code requirements, see Java UDFs.
Python UDF
Python UDFs do not support method overloading, so each complex type requires a separate UDF with its own name.
MaxCompute projects run Python 2 by default. To use Python 3, run set odps.sql.python.version=cp37 at the session level before calling a Python 3 UDF.
For other Python 3 UDF requirements, see Python 3 UDFs.
UDF_COMPLEX_DATA_ARRAY — handles ARRAY<BIGINT> input:
from odps.udf import annotate
import datetime
@annotate('array<bigint>->array<datetime>')
class ArrayExample:
def evaluate(self, input_list):
output_list = list()
for item in input_list:
t = datetime.datetime.fromtimestamp(item)
output_list.append(t)
return output_list
UDF_COMPLEX_DATA_MAP — handles MAP<STRING, BIGINT> input:
from odps.udf import annotate
import datetime
@annotate('map<string,bigint>->map<string,datetime>')
class MapExample:
def evaluate(self, input_dict):
output_dict = dict()
for key in input_dict:
value = input_dict[key]
t = datetime.datetime.fromtimestamp(value)
output_dict[key] = t
return output_dict
UDF_COMPLEX_DATA_STRUCT — handles STRUCT<input_name:STRING, input_timestamp:BIGINT> input:
from odps.udf import annotate
import datetime, collections
@annotate('struct<input_name:string,input_timestamp:bigint>->struct<output_name:string,output_time:datetime>')
class StructExample:
def evaluate(self, input_namedtuple):
OutputNamedTuple = collections.namedtuple('output_namedtuple', ['output_name', 'output_time'])
name_val = input_namedtuple.input_name
time_val = datetime.datetime.fromtimestamp(input_namedtuple.input_timestamp)
output_namedtuple = OutputNamedTuple(name_val, time_val)
return output_namedtuple
Step 2: Upload resources and create the UDF
After writing and debugging your UDF code, upload it to MaxCompute and register the UDF.
-
Java UDF: Package the compiled class into a JAR, upload it as a resource, and create the UDF named
UDF_COMPLEX_DATA. See Package a Java program, upload the package, and create a MaxCompute UDF. -
Python UDF: Upload each
.pyfile as a resource, then create three UDFs:UDF_COMPLEX_DATA_ARRAY,UDF_COMPLEX_DATA_MAP, andUDF_COMPLEX_DATA_STRUCT. See Upload a Python program and create a MaxCompute UDF.
Step 3: Call the UDF
ARRAY
-- Java UDF
SELECT UDF_COMPLEX_DATA(array(1554047999, 1554047989));
-- Python UDF (enable Python 3 first)
set odps.sql.python.version=cp37;
SELECT UDF_COMPLEX_DATA_ARRAY(array(1554047999, 1554047989));
Expected output:
+---------------------------------------------+
| _c0 |
+---------------------------------------------+
| [2019-03-31 23:59:59, 2019-03-31 23:59:49] |
+---------------------------------------------+
MAP
-- Java UDF
SELECT UDF_COMPLEX_DATA(map('date1', 1554047989, 'date2', 1554047999));
-- Python UDF (enable Python 3 first)
set odps.sql.python.version=cp37;
SELECT UDF_COMPLEX_DATA_MAP(map('date1', 1554047989, 'date2', 1554047999));
Expected output:
+----------------------------------------------------------------+
| _c0 |
+----------------------------------------------------------------+
| {"date1":"2019-03-31 23:59:49","date2":"2019-03-31 23:59:59"} |
+----------------------------------------------------------------+
STRUCT
-- Java UDF
SELECT UDF_COMPLEX_DATA(struct('date', 1554047989));
-- Python UDF (enable Python 3 first)
set odps.sql.python.version=cp37;
SELECT UDF_COMPLEX_DATA_STRUCT(struct('date', 1554047989));
Expected output:
+-------------------------------------------------------------+
| _c0 |
+-------------------------------------------------------------+
| {"output_name":"date","output_time":"2019-03-31 23:59:49"} |
+-------------------------------------------------------------+
See also
-
Java UDFs — full reference for Java UDF specifications
-
Python 3 UDFs — Python 3 UDF specifications and limitations