All Products
Search
Document Center

MaxCompute:Example: Replace a string by using a regular expression

Last Updated:Mar 26, 2026

Compared with the built-in REGEXP_REPLACE function, a user-defined function (UDF) allows you to use variables in a regular expression. This topic shows how to implement a UDF_REPLACE_BY_REGEXP UDF in Java or Python that accepts the regex pattern as an argument.

UDF signature

Syntax:

string UDF_REPLACE_BY_REGEXP(string <s>, string <regex>, string <replacement>)

Parameters:

All three parameters are required and accept STRING values.

Parameter Description
s The source string
regex The regular expression to match against s
replacement The string that replaces each match

Return value: STRING

Prerequisites

Before you begin, ensure that you have:

  • A MaxCompute project with UDF development permissions

  • A Java or Python development environment

Step 1: Write the UDF

Choose Java or Python based on your development environment.

Java UDF

package com.aliyun.rewrite; // Specify a package name.
import com.aliyun.odps.udf.UDF;
import com.aliyun.odps.udf.annotation.UdfProperty;

import java.util.Objects;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

@UdfProperty(isDeterministic=true)
public class ReplaceByRegExp extends UDF {
    /**
     * The regular expression in the most recent query, which is cached to avoid multiple compilations.
     */
    private String lastRegex = "";
    private Pattern pattern = null;

    /**
     * @param s The source string.
     * @param regex The regular expression.
     * @param replacement The string that replaces the source string.
     */
    public String evaluate(String s, String regex, String replacement) {
        Objects.requireNonNull(s, "The source string cannot be null");
        Objects.requireNonNull(regex, "The regular expression cannot be null");
        Objects.requireNonNull(replacement, "The string that replaces the source string cannot be null");

        // If the regular expression is changed, recompile the regular expression.
        if (!regex.equals(lastRegex)) {
            lastRegex = regex;
            pattern = Pattern.compile(regex);
        }
        Matcher m = pattern.matcher(s);
        StringBuffer sb = new StringBuffer();

        // Perform text replacement.
        while (m.find()) {
            m.appendReplacement(sb, replacement);
        }
        m.appendTail(sb);
        return sb.toString();
    }
}

A Java UDF must extend the UDF class. The evaluate method signature — three STRING input parameters and a STRING return value — defines the UDF's signature in SQL statements. For full Java UDF specifications, see Java UDFs.

Python 3 UDF

from odps.udf import annotate
import re

@annotate("string,string,string->string")
class ReplaceByRegExp(object):
    def __init__(self):
        self.lastRegex = ""
        self.pattern = None

    def evaluate(self, s, regex, replacement):
        if not s or not regex or not replacement:
            raise ValueError("Arguments with None")
        # If the regular expression is changed, recompile the regular expression.
        if regex != self.lastRegex:
            self.lastRegex = regex
            self.pattern = re.compile(regex)
        result = self.pattern.sub(replacement, s)
        return result

MaxCompute projects run UDFs with Python 2 by default. To use Python 3, run set odps.sql.python.version=cp37 at the session level before calling the UDF. For full Python 3 UDF specifications, see Python 3 UDFs.

Python 2 UDF

#coding:utf-8
from odps.udf import annotate
import re

@annotate("string,string,string->string")
class ReplaceByRegExp(object):
    def __init__(self):
        self.lastRegex = ""
        self.pattern = None

    def evaluate(self, s, regex, replacement):
        if not s or not regex or not replacement:
            raise ValueError("Arguments with None")
        # If the regular expression is changed, recompile the regular expression.
        if regex != self.lastRegex:
            self.lastRegex = regex
            self.pattern = re.compile(regex)
        result = self.pattern.sub(replacement, s)
        return result

If your Python 2 UDF code contains Chinese characters, add an encoding declaration at the top of the file. Both #coding:utf-8 and # -*- coding: utf-8 -*- are valid. For full Python 2 UDF specifications, see Python 2 UDFs.

Step 2: Upload resources and register the UDF

After writing and testing your UDF code, upload it to MaxCompute and register it as UDF_REPLACE_BY_REGEXP.

Step 3: Use the UDF

Run the following SQL to replace all digit sequences in a string with #:

set odps.sql.python.version=cp37; -- To use a UDF in Python 3, run this command.
SELECT UDF_REPLACE_BY_REGEXP('abc 123 def 456', '\\d+', '#');

Expected output:

+--------------+
| _c0          |
+--------------+
| abc # def #  |
+--------------+

What's next