All Products
Search
Document Center

MaxCompute:UDF example: Get a character at a specified position in a URL

Last Updated:Mar 26, 2026

This example shows how to write and register a Java or Python user-defined function (UDF) named UDF_GET_URL_CHAR that extracts a segment from a URL path by position. Use this UDF when built-in MaxCompute string functions do not cover your URL parsing logic.

How it works

UDF_GET_URL_CHAR takes a URL string and an integer position n, then:

  1. Finds the first occurrence of .htm in the URL.

  2. Extracts the path segment immediately before .htm.

  3. Splits that segment by hyphen (-).

  4. Returns the nth element from the right of the resulting array.

Function signature

string UDF_GET_URL_CHAR(string <url>, bigint <n>)
Parameter Type Required Description
url STRING Yes The source URL to parse
n BIGINT Yes The position to retrieve, counted from right to left

Return type: STRING.

Return value behavior

Condition Return value
URL does not contain .htm Empty string
n is 0 Empty string
n exceeds the number of hyphen-delimited segments Empty string
Valid input The nth segment from the right
The function never returns null. All edge cases return an empty string.

Prerequisites

Before you begin, ensure that you have:

  • A MaxCompute project with permissions to create resources and register UDFs

  • The MaxCompute client or DataWorks installed and configured

Step 1: Write the UDF

Java UDF

The evaluate method accepts a String (maps to SQL STRING) and a Long (maps to SQL BIGINT), and returns a String. The class must extend com.aliyun.odps.udf.UDF.

package com.aliyun; // The package name, which is user-defined.
import com.aliyun.odps.udf.UDF;

public class GetUrlChar extends UDF {
    public String evaluate(String url, Long n)  {
        if (n == 0) {
            return "";
        }
        try {
            // Find the index of the first occurrence of ".htm" in the URL.
            int index = url.indexOf(".htm");
            if (index < 0)  {
                return "";
            }
            // Extract the prefix up to (but not including) ".htm".
            String a = url.substring(0, index);
            // Find the last forward slash in the prefix.
            index = a.lastIndexOf("/");
            // Extract the segment after the last slash.
            String b = a.substring(index  +  1);
            // Split the segment by hyphen.
            String[] c = b.split("-");
            // Return empty string if n exceeds the number of segments.
            if (c.length  <  n)  {
                return  "";
            }
            // Return the nth element from the right.
            return c[c.length - n.intValue()];
        } catch (Exception e)  {
            return  "Internal error";
        }
    }
}

For code specifications and class requirements, see Java UDFs.

Python 3 UDF

from odps.udf import annotate


@annotate("string,bigint->string")
class GetUrlChar(object):

    def evaluate(self, url, n):
        if n == 0:
            return ""
        try:
            index = url.find(".htm")
            if index < 0:
                return ""
            a = url[:index]
            index = a.rfind("/")
            b = a[index + 1:]
            c = b.split("-")
            if len(c) < n:
                return ""
            return c[-n]
        except Exception:
            return "Internal error"

MaxCompute projects run Python 2 by default. To use a Python 3 UDF in a session, run:

set odps.sql.python.version=cp37;

For Python 3 UDF specifications, see Python 3 UDFs.

Python 2 UDF

#coding:utf-8
from odps.udf import annotate


@annotate("string,bigint->string")
class GetUrlChar(object):

    def evaluate(self, url, n):
        if n == 0:
            return ""
        try:
            index = url.find(".htm")
            if index < 0:
                return ""
            a = url[:index]
            index = a.rfind("/")
            b = a[index + 1:]
            c = b.split("-")
            if len(c) < n:
                return ""
            return c[-n]
        except Exception:
            return "Internal error"

If the code contains Chinese characters, add an encoding declaration at the top of the file — either #coding:utf-8 or # -*- coding: utf-8 -*-. Both formats are equivalent.

For Python 2 UDF specifications, see Python 2 UDFs.

Step 2: Upload and register the UDF

After writing and testing the UDF code, upload it to MaxCompute as a resource and register it under the name UDF_GET_URL_CHAR.

Step 3: Call the UDF

Call UDF_GET_URL_CHAR with a SQL SELECT statement. If you registered a Python 3 UDF, prepend set odps.sql.python.version=cp37; to your query.

Example 1: URL with no .htm path

set odps.sql.python.version=cp37; -- Required only for Python 3 UDFs.
SELECT UDF_GET_URL_CHAR("http://www.taobao.com", 1);

Result:

+-----+
| _c0 |
+-----+
|     |
+-----+

The URL http://www.taobao.com contains no .htm, so the function returns an empty string.

Example 2: URL with a single-segment path

SELECT UDF_GET_URL_CHAR("http://www.taobao.com/a.htm", 1);

Result:

+-----+
| _c0 |
+-----+
| a   |
+-----+

The segment before .htm is a. Splitting by hyphen yields one element; n=1 from the right returns a.

Example 3: URL with a multi-segment path

SELECT UDF_GET_URL_CHAR("http://www.taobao.com/a-b-c-d.htm", 3);

Result:

+-----+
| _c0 |
+-----+
| b   |
+-----+

The segment a-b-c-d splits into ["a", "b", "c", "d"]. Counting from the right, position 3 is b.