When a key-value string contains delimiter characters within its keys or values, the built-in KEYVALUE function cannot parse it correctly. Use the UDF_EXTRACT_KEY_VALUE_WITH_SPLIT user-defined function (UDF) to extract values from these strings.
This topic walks you through writing the UDF in Java or Python, registering it in MaxCompute, and calling it in SQL.
Steps:
Function signature
string UDF_EXTRACT_KEY_VALUE_WITH_SPLIT(string <s>, string <split1>, string <split2>, string <keyname>)
The function splits s into key-value pairs using split1, then splits each pair into a key and value using split2, and returns the value for the key specified by keyname.
Parameters (all STRING type, all required):
| Parameter | Description |
|---|---|
s |
The string to parse |
split1 |
The delimiter that separates key-value pairs |
split2 |
The delimiter that separates keys from values |
keyname |
The key whose value to return |
The function returns null if any parameter is null or if the key is not found.
Prerequisites
Before you begin, make sure you have:
-
A MaxCompute project with UDF development permissions
-
A Java or Python development environment for writing and testing the UDF code
1. Write the UDF
Choose a language for your UDF implementation. All three implementations produce the same result.
Java
package com.aliyun.rewrite; // The package name, which is user-defined.
import com.aliyun.odps.udf.UDF;
public class ExtractKeyValueWithSplit extends UDF{
/**
* Use split1 to split the string and obtain key-value pairs. Then, use split2 to split the key-value pairs and obtain the keys and values.
* @param str The source string.
* @param split1 The delimiter that is used to split a string and obtain key-value pairs.
* @param split2 The delimiter that is used to split key-value pairs and obtain the keys and values.
* @param keyname The name of the key whose value you want to obtain.
* @return The returned value.
*/
public String evaluate(String str, String split1, String split2, String keyname) {
if(str==null || split1==null || split2==null || keyname==null){
return null;
}
try {
// Combine keyname and split2.
String keySplit = keyname + split2;
// Traverse the string. Use split1 to split the string and obtain key-value pairs.
for(String subStr: str.split(split1)){
// Use split2 to split the key-value pairs and obtain the keys and values. Then, obtain the value that corresponds to a specific key.
if (subStr.startsWith(keySplit)){
return subStr.substring(keySplit.length());
}
}
} catch (Exception e) {
return null;
}
return null;
}
}
The Java UDF must extend the UDF class. The evaluate method defines the function signature used in SQL statements: four STRING input parameters and a STRING return value. For other code specifications, see Java UDFs.
Python 3
MaxCompute projects run Python 2 by default. To use this Python 3 UDF, run the following command at the session level before calling it: set odps.sql.python.version=cp37.
from odps.udf import annotate
@annotate("string,string,string,string->string")
class ExtractKeyValueWithSplit(object):
def evaluate(self, s, split1, split2, keyname):
if not s:
return None
key_split = keyname + split2
# Traverse the string. Use split1 to split the string and obtain key-value pairs.
for subStr in s.split(split1):
# Use split2 to split the key-value pairs and obtain the keys and values. Then, obtain the value that corresponds to a specific key.
if subStr.startswith(key_split):
return subStr[len(key_split):]
For Python 3 UDF specifications, see Python 3 UDFs.
Python 2
#coding:utf-8
from odps.udf import annotate
@annotate("string,string,string,string->string")
class ExtractKeyValueWithSplit(object):
def evaluate(self, s, split1, split2, keyname):
if not s:
return None
key_split = keyname + split2
# Traverse the string. Use split1 to split the string and obtain key-value pairs.
for subStr in s.split(split1):
# Use split2 to split the key-value pairs and obtain the keys and values. Then, obtain the value that corresponds to a specific key.
if subStr.startswith(key_split):
return subStr[len(key_split):]
If your Python 2 UDF code contains Chinese characters, add an encoding declaration at the top of the file: #coding:utf-8 or # -*- coding: utf-8 -*-. Without this declaration, MaxCompute returns an error when running the UDF.
For Python 2 UDF specifications, see Python 2 UDFs.
2. Upload resources and register the UDF
After writing and testing your UDF, upload the code to MaxCompute and register it as UDF_EXTRACT_KEY_VALUE_WITH_SPLIT.
-
Java UDF: See Package a Java program, upload the package, and create a MaxCompute UDF.
-
Python UDF: See Upload a Python program and create a MaxCompute UDF.
3. Use the UDF
After registering the UDF, call it in a SQL statement. The following example extracts the value for the key name from a string where the value itself contains the delimiter :.
set odps.sql.python.version=cp37; -- Required only for the Python 3 UDF. Omit this line for Java or Python 2.
SELECT UDF_EXTRACT_KEY_VALUE_WITH_SPLIT('name:zhangsang:man;age:2;', ';', ':', 'name');
In this example:
-
split1is;, which splits the string into pairs:name:zhangsang:manandage:2 -
split2is:, which identifies the key by prefix match -
The function returns everything after
name:in the first matching pair
Expected output:
+--------------+
| _c0 |
+--------------+
| zhangsan:man |
+--------------+
What's next
-
To parse key-value strings where keys and values do not contain delimiters, see Obtain the values of strings that have no delimiters.
-
To learn more about UDF development, see Java UDFs and Python 3 UDFs.