All Products
Search
Document Center

MaxCompute:UDF example: Get a value from a string that contains a separator

Last Updated:Mar 26, 2026

When a key-value string contains delimiter characters within its keys or values, the built-in KEYVALUE function cannot parse it correctly. Use the UDF_EXTRACT_KEY_VALUE_WITH_SPLIT user-defined function (UDF) to extract values from these strings.

This topic walks you through writing the UDF in Java or Python, registering it in MaxCompute, and calling it in SQL.

Steps:

  1. Write the UDF

  2. Upload resources and register the UDF

  3. Use the UDF

Function signature

string UDF_EXTRACT_KEY_VALUE_WITH_SPLIT(string <s>, string <split1>, string <split2>, string <keyname>)

The function splits s into key-value pairs using split1, then splits each pair into a key and value using split2, and returns the value for the key specified by keyname.

Parameters (all STRING type, all required):

Parameter Description
s The string to parse
split1 The delimiter that separates key-value pairs
split2 The delimiter that separates keys from values
keyname The key whose value to return

The function returns null if any parameter is null or if the key is not found.

Prerequisites

Before you begin, make sure you have:

  • A MaxCompute project with UDF development permissions

  • A Java or Python development environment for writing and testing the UDF code

1. Write the UDF

Choose a language for your UDF implementation. All three implementations produce the same result.

Java

package com.aliyun.rewrite; // The package name, which is user-defined.
import com.aliyun.odps.udf.UDF;

public class ExtractKeyValueWithSplit extends UDF{
    /**
     * Use split1 to split the string and obtain key-value pairs. Then, use split2 to split the key-value pairs and obtain the keys and values.
     * @param str    The source string.
     * @param split1  The delimiter that is used to split a string and obtain key-value pairs.
     * @param split2  The delimiter that is used to split key-value pairs and obtain the keys and values.
     * @param keyname The name of the key whose value you want to obtain.
     * @return The returned value.
     */
    public String evaluate(String str, String split1, String split2, String keyname) {
        if(str==null || split1==null || split2==null || keyname==null){
            return null;
        }
        try {
            // Combine keyname and split2.
            String keySplit = keyname + split2;

            // Traverse the string. Use split1 to split the string and obtain key-value pairs.
            for(String subStr: str.split(split1)){
                // Use split2 to split the key-value pairs and obtain the keys and values. Then, obtain the value that corresponds to a specific key.
                if (subStr.startsWith(keySplit)){
                    return subStr.substring(keySplit.length());
                }
            }
        } catch (Exception e) {
            return null;
        }
        return null;
    }
}

The Java UDF must extend the UDF class. The evaluate method defines the function signature used in SQL statements: four STRING input parameters and a STRING return value. For other code specifications, see Java UDFs.

Python 3

Important

MaxCompute projects run Python 2 by default. To use this Python 3 UDF, run the following command at the session level before calling it: set odps.sql.python.version=cp37.

from odps.udf import annotate


@annotate("string,string,string,string->string")
class ExtractKeyValueWithSplit(object):
    def evaluate(self, s, split1, split2, keyname):
        if not s:
            return None
        key_split = keyname + split2
        # Traverse the string. Use split1 to split the string and obtain key-value pairs.
        for subStr in s.split(split1):
            # Use split2 to split the key-value pairs and obtain the keys and values. Then, obtain the value that corresponds to a specific key.
            if subStr.startswith(key_split):
                return subStr[len(key_split):]

For Python 3 UDF specifications, see Python 3 UDFs.

Python 2

#coding:utf-8
from odps.udf import annotate


@annotate("string,string,string,string->string")
class ExtractKeyValueWithSplit(object):
    def evaluate(self, s, split1, split2, keyname):
        if not s:
            return None
        key_split = keyname + split2
        # Traverse the string. Use split1 to split the string and obtain key-value pairs.
        for subStr in s.split(split1):
            # Use split2 to split the key-value pairs and obtain the keys and values. Then, obtain the value that corresponds to a specific key.
            if subStr.startswith(key_split):
                return subStr[len(key_split):]

If your Python 2 UDF code contains Chinese characters, add an encoding declaration at the top of the file: #coding:utf-8 or # -*- coding: utf-8 -*-. Without this declaration, MaxCompute returns an error when running the UDF.

For Python 2 UDF specifications, see Python 2 UDFs.

2. Upload resources and register the UDF

After writing and testing your UDF, upload the code to MaxCompute and register it as UDF_EXTRACT_KEY_VALUE_WITH_SPLIT.

3. Use the UDF

After registering the UDF, call it in a SQL statement. The following example extracts the value for the key name from a string where the value itself contains the delimiter :.

set odps.sql.python.version=cp37; -- Required only for the Python 3 UDF. Omit this line for Java or Python 2.
SELECT UDF_EXTRACT_KEY_VALUE_WITH_SPLIT('name:zhangsang:man;age:2;', ';', ':', 'name');

In this example:

  • split1 is ;, which splits the string into pairs: name:zhangsang:man and age:2

  • split2 is :, which identifies the key by prefix match

  • The function returns everything after name: in the first matching pair

Expected output:

+--------------+
| _c0          |
+--------------+
| zhangsan:man |
+--------------+

What's next