This example shows how to write a user-defined function (UDF) named UDF_EXTRACT_KEY_VALUE to extract a value from a key-value pair string that uses custom delimiters — for example, name:zhangsan;age:21;.
This UDF is designed for strings without fixed delimiters. To parse strings that already have standard delimiters, see Example: Obtain the values of strings that have delimiters.
How it works
UDF_EXTRACT_KEY_VALUE splits a string in two passes:
-
Use
split1to split the string into key-value pairs. -
Use
split2to split each pair into a key and a value. -
Return the value for the key specified by
keyname.
Syntax
STRING UDF_EXTRACT_KEY_VALUE(STRING <s>, STRING <split1>, STRING <split2>, STRING <keyname>)
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
s |
STRING | Yes | The source string |
split1 |
STRING | Yes | The delimiter that separates key-value pairs |
split2 |
STRING | Yes | The delimiter that separates keys from values within each pair |
keyname |
STRING | Yes | The key whose value you want to retrieve |
Return value: STRING — the value of the specified key, or NULL if the key is not found or the input is empty.
Prerequisites
Before you begin, ensure that you have:
-
A MaxCompute project with UDF creation permissions
-
The MaxCompute client or DataWorks DataStudio for uploading resources and registering the UDF
Write the UDF
Choose a language and implement the UDF logic.
Java
package com.aliyun.rewrite; // Specify a package name.
import com.aliyun.odps.udf.UDF;
import java.util.HashMap;
import java.util.Map;
public class ExtractKeyValue extends UDF {
private static final int KEY_VALUE_LENGTH = 2;
/**
* Extracts the value for a given key from a delimited string.
* @param str The source string.
* @param split1 The delimiter that separates key-value pairs.
* @param split2 The delimiter that separates keys from values.
* @param keyname The key whose value you want to retrieve.
* @return The value of the specified key, or null if not found.
*/
public String evaluate(String str, String split1, String split2, String keyname) {
try {
// Return null for empty or null input.
if (str == null || "".equals(str)) {
return null;
}
Map<String, String> keyValueCache = new HashMap<>(8);
String[] extractedKeyValues = str.split(split1);
// Split each key-value pair and store the result.
for (String keyValue : extractedKeyValues) {
storeKeyValue(keyValueCache, keyValue, split2);
}
// Return the value for the specified key.
return keyValueCache.get(keyname);
} catch (Exception e) {
return null;
}
}
/**
* Splits a key-value pair and stores it in the cache.
* @param keyValueCache The map to store the parsed key-value pairs.
* @param keyValue A single key-value pair string.
* @param split The delimiter between the key and value.
*/
private void storeKeyValue(Map<String, String> keyValueCache, String keyValue, String split) {
if (keyValue == null || "".equals(keyValue)) {
return;
}
String[] keyValueArr = keyValue.split(split);
if (keyValueArr.length == KEY_VALUE_LENGTH) {
keyValueCache.put(keyValueArr[0], keyValueArr[1]);
}
}
}
The evaluate method defines four STRING input parameters and returns a STRING value. This signature becomes the UDF's SQL signature. For Java UDF specifications, see Java UDFs.
Python 3
from odps.udf import annotate
@annotate("string,string,string,string->string")
class ExtractKeyValue(object):
def evaluate(self, s, split1, split2, keyname):
if not s:
return None
# Split the string into key-value pairs, then split each pair into a key and value.
key_value_cache = dict(kv.split(split2) for kv in s.split(split1) if kv)
# Return the value for the specified key.
return key_value_cache.get(keyname)
MaxCompute projects run Python 2 by default. To use this Python 3 UDF, run the following command at the session level before calling the UDF:
set odps.sql.python.version=cp37;
For Python 3 UDF specifications, see Python 3 UDFs.
Python 2
#coding:utf-8
from odps.udf import annotate
@annotate("string,string,string,string->string")
class ExtractKeyValue(object):
def evaluate(self, s, split1, split2, keyname):
if not s:
return None
# Split the string into key-value pairs, then split each pair into a key and value.
key_value_cache = dict(kv.split(split2) for kv in s.split(split1) if kv)
# Return the value for the specified key.
return key_value_cache.get(keyname)
If your UDF code contains Chinese characters, add an encoding declaration (#coding:utf-8 or # -*- coding: utf-8 -*-) at the top of the file to avoid runtime errors. For Python 2 UDF specifications, see Python 2 UDFs.
Upload resources and register the UDF
After developing and testing your UDF code, upload it to MaxCompute and register the UDF as UDF_EXTRACT_KEY_VALUE.
Run the UDF
After registering the UDF, run the following SQL to extract the value of the name key from name:zhangsan;age:21;:
-- To use a Python 3 UDF, run this line first.
set odps.sql.python.version=cp37;
SELECT UDF_EXTRACT_KEY_VALUE('name:zhangsan;age:21;', ';', ':', 'name');
Expected output:
+----------+
| _c0 |
+----------+
| zhangsan |
+----------+
Next steps
-
To parse strings that have standard built-in delimiters, see Example: Obtain the values of strings that have delimiters.
-
To learn more about the UDF development lifecycle, see Java UDFs and Python 3 UDFs.