Write Java and Python UDFs for complex data types ARRAY MAP and STRUCT - MaxCompute

When you write a UDF that processes structured data, you need to map MaxCompute complex types — ARRAY, MAP, and STRUCT — to the corresponding Java or Python types in your handler code. This tutorial shows you how to implement, deploy, and call a UDF that converts timestamps across all three complex types.

Java UDFs support method overloading, so a single UDF name handles all three types. Python UDFs require a separate function for each type.

Prerequisites

Before you begin, ensure that you have:

A MaxCompute project
A Java development environment with Maven, or a Python 3 environment

Type mappings

Each MaxCompute complex data type maps to a specific Java or Python type in your UDF handler:

MaxCompute type	Java type	Python type
ARRAY	`java.util.List`	`list`
MAP	`java.util.Map`	`dict`
STRUCT	`com.aliyun.odps.data.Struct`	`collections.namedtuple`

STRUCT requires the @Resolve annotation in Java. Because reflection cannot read field names and types from com.aliyun.odps.data.Struct, the annotation supplies that information at compile time. The annotation affects only overloaded methods whose input parameters or return value includes Struct.

Step 1: Write the UDF

Java UDF

All three evaluate methods in the Java class share the name UDF_COMPLEX_DATA. MaxCompute dispatches calls based on the argument type, so each method handles one complex type.

The UDF signatures in SQL map directly to the Java method signatures:

array<string>                                    UDF_COMPLEX_DATA(array<bigint> as)
map<string, string>                              UDF_COMPLEX_DATA(map<string,bigint> ms)
struct<output_name:string,output_time:string>    UDF_COMPLEX_DATA(struct<input_name:string,input_timestamp:bigint> st)

Add the following dependency to pom.xml:

<dependency>
  <groupId>com.aliyun.odps</groupId>
  <artifactId>odps-sdk-udf</artifactId>
  <version>0.29.10-public</version>
</dependency>

package com.aliyun; // Specify a package name.

import com.aliyun.odps.data.Struct;
import com.aliyun.odps.udf.UDF;
import com.aliyun.odps.udf.annotation.Resolve;

import java.text.SimpleDateFormat;
import java.util.*;

@Resolve("struct<input_name:string, input_timestamp:bigint>->map<string,string>")
public class ComplexDataTypeExample extends UDF{
    private static final String PATTERN = "yyyy-MM-dd HH:mm:ss";

    /**
     * Convert a list of timestamps into a list of time strings.
     * @param timestamps Enter a list of timestamps.
     * @return Obtain a list of time strings.
     */
    public List<String> evaluate(List<Long> timestamps) {
        if (timestamps == null) {
            return null;
        }
        List<String> result = new ArrayList<>();
        SimpleDateFormat formatter = new SimpleDateFormat(PATTERN);
        for (Long timestamp : timestamps) {
            Date date = new Date(timestamp < 9999999999L ? timestamp * 1000 : timestamp);
            String dateString = formatter.format(date);
            result.add(dateString);
        }
        return result;
    }

    /**
     * Convert timestamps of the MAP data type into time strings of the MAP data type.
     * @param timestamps Enter data of the MAP data type in which values are timestamps.
     * @return Obtain a list of time strings of the MAP data type.
     */
    public Map<String, String> evaluate(Map<String, Long> timestamps) {
        if (timestamps == null) {
            return null;
        }
        Map<String, String> result = new HashMap<>(timestamps.size());
        SimpleDateFormat formatter = new SimpleDateFormat(PATTERN);
        for (String key : timestamps.keySet()) {
            Long timestamp = timestamps.get(key);
            Date date = new Date(timestamp < 9999999999L ? timestamp * 1000 : timestamp);
            String dateString = formatter.format(date);
            result.put(key, dateString);
        }
        return result;
    }

    /**
     * Convert a timestamp into a time string.
     * @param input Enter a timestamp of the STRUCT data type.
     * @return Obtain a time string of the STRUCT data type.
     */
    public Map<String, String> evaluate(Struct input) {
        if (input == null) {
            return null;
        }
        SimpleDateFormat formatter = new SimpleDateFormat(PATTERN);
        String nameValue = (String) input.getFieldValue("input_name");
        Long timestampValue = (Long) input.getFieldValue("input_timestamp");
        Date date = new Date(timestampValue < 9999999999L ? timestampValue * 1000 : timestampValue);
        String dateString = formatter.format(date);
        Map<String, String> result = new HashMap<>(8);
        result.put("output_name", nameValue);
        result.put("output_time", dateString);
        return result;
    }
}

Each evaluate method returns null when its input is null. MaxCompute does not guarantee the evaluation order of SQL subexpressions, so null checks inside the UDF are the reliable way to handle null inputs.

For other code requirements, see Java UDFs.

Python UDF

Python UDFs do not support method overloading, so each complex type requires a separate UDF with its own name.

MaxCompute projects run Python 2 by default. To use Python 3, run set odps.sql.python.version=cp37 at the session level before calling a Python 3 UDF.

For other Python 3 UDF requirements, see Python 3 UDFs.

UDF_COMPLEX_DATA_ARRAY — handles ARRAY<BIGINT> input:

from odps.udf import annotate
import datetime
@annotate('array<bigint>->array<datetime>')
class ArrayExample:
    def evaluate(self, input_list):
        output_list = list()
        for item in input_list:
            t = datetime.datetime.fromtimestamp(item)
            output_list.append(t)
        return output_list

UDF_COMPLEX_DATA_MAP — handles MAP<STRING, BIGINT> input:

from odps.udf import annotate
import datetime
@annotate('map<string,bigint>->map<string,datetime>')
class MapExample:
    def evaluate(self, input_dict):
        output_dict = dict()
        for key in input_dict:
            value = input_dict[key]
            t = datetime.datetime.fromtimestamp(value)
            output_dict[key] = t
        return output_dict

UDF_COMPLEX_DATA_STRUCT — handles STRUCT<input_name:STRING, input_timestamp:BIGINT> input:

from odps.udf import annotate
import datetime, collections
@annotate('struct<input_name:string,input_timestamp:bigint>->struct<output_name:string,output_time:datetime>')
class StructExample:
    def evaluate(self, input_namedtuple):
        OutputNamedTuple = collections.namedtuple('output_namedtuple', ['output_name', 'output_time'])
        name_val = input_namedtuple.input_name
        time_val = datetime.datetime.fromtimestamp(input_namedtuple.input_timestamp)
        output_namedtuple = OutputNamedTuple(name_val, time_val)
        return output_namedtuple

Step 2: Upload resources and create the UDF

After writing and debugging your UDF code, upload it to MaxCompute and register the UDF.

Java UDF: Package the compiled class into a JAR, upload it as a resource, and create the UDF named UDF_COMPLEX_DATA. See Package a Java program, upload the package, and create a MaxCompute UDF.
Python UDF: Upload each .py file as a resource, then create three UDFs: UDF_COMPLEX_DATA_ARRAY, UDF_COMPLEX_DATA_MAP, and UDF_COMPLEX_DATA_STRUCT. See Upload a Python program and create a MaxCompute UDF.

Step 3: Call the UDF

ARRAY

-- Java UDF
SELECT UDF_COMPLEX_DATA(array(1554047999, 1554047989));

-- Python UDF (enable Python 3 first)
set odps.sql.python.version=cp37;
SELECT UDF_COMPLEX_DATA_ARRAY(array(1554047999, 1554047989));

Expected output:

+---------------------------------------------+
| _c0                                         |
+---------------------------------------------+
| [2019-03-31 23:59:59, 2019-03-31 23:59:49]  |
+---------------------------------------------+

MAP

-- Java UDF
SELECT UDF_COMPLEX_DATA(map('date1', 1554047989, 'date2', 1554047999));

-- Python UDF (enable Python 3 first)
set odps.sql.python.version=cp37;
SELECT UDF_COMPLEX_DATA_MAP(map('date1', 1554047989, 'date2', 1554047999));

Expected output:

+----------------------------------------------------------------+
| _c0                                                            |
+----------------------------------------------------------------+
| {"date1":"2019-03-31 23:59:49","date2":"2019-03-31 23:59:59"}  |
+----------------------------------------------------------------+

STRUCT

-- Java UDF
SELECT UDF_COMPLEX_DATA(struct('date', 1554047989));

-- Python UDF (enable Python 3 first)
set odps.sql.python.version=cp37;
SELECT UDF_COMPLEX_DATA_STRUCT(struct('date', 1554047989));

Expected output:

+-------------------------------------------------------------+
| _c0                                                         |
+-------------------------------------------------------------+
| {"output_name":"date","output_time":"2019-03-31 23:59:49"}  |
+-------------------------------------------------------------+

MaxCompute:Example: Use complex data types in UDFs

Prerequisites

Type mappings

Step 1: Write the UDF

Java UDF

Python UDF

Step 2: Upload resources and create the UDF

Step 3: Call the UDF

ARRAY

MAP

STRUCT

See also