Java user-defined functions (UDFs) let you extend StarRocks with custom logic that built-in functions cannot express. EMR Serverless StarRocks supports four UDF types:
| Type | What it does |
|---|---|
| Scalar UDF | Takes one row as input, returns one value. Equivalent to built-in functions like UPPER or ROUND. |
| UDAF (user-defined aggregate function) | Takes multiple rows as input, returns one value per group. Equivalent to built-in functions like SUM or COUNT. |
| UDWF (user-defined window function) | Operates over a window of rows defined by an OVER clause, returns one value per row. |
| UDTF (user-defined table-valued function) | Takes one row as input, returns multiple rows in a single column. Commonly used for row-to-column conversion. |
StarRocks 2.2.0 and later support Java UDFs. StarRocks 3.0 and later support global UDFs — add the GLOBAL keyword to CREATE, SHOW, and DROP statements to make a UDF accessible across all databases without a catalog or database prefix.
Prerequisites
Before you begin, make sure you have:
-
Apache Maven installed (for building the Java project)
-
Java Development Kit (JDK) 1.8 installed on the server
-
The UDF feature enabled: on the Instance Configuration tab of your EMR Serverless StarRocks instance details page, go to the FE section, set
enable_udftoTRUE, and restart the instance
Data type mappings
All parameter and return types in your Java class must map to a supported SQL type. The table below shows the supported mappings.
| SQL type | Java type |
|---|---|
| BOOLEAN | java.lang.Boolean |
| TINYINT | java.lang.Byte |
| SMALLINT | java.lang.Short |
| INT | java.lang.Integer |
| BIGINT | java.lang.Long |
| FLOAT | java.lang.Float |
| DOUBLE | java.lang.Double |
| STRING/VARCHAR | java.lang.String |
Develop and deploy a UDF
The workflow has seven steps: create a Maven project, add dependencies, implement the UDF class, package the JAR, upload it to OSS, register the UDF in StarRocks, and call it in a query.
Step 1: Create a Maven project
Create a Maven project with the following directory structure:
project
|--pom.xml
|--src
| |--main
| | |--java
| | |--resources
| |--test
|--target
Step 2: Add dependencies
Add the following content to pom.xml:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>udf</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.76</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<version>2.10</version>
<executions>
<execution>
<id>copy-dependencies</id>
<phase>package</phase>
<goals>
<goal>copy-dependencies</goal>
</goals>
<configuration>
<outputDirectory>${project.build.directory}/lib</outputDirectory>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>3.3.0</version>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>
</plugins>
</build>
</project>
Step 3: Implement the UDF class
Scalar UDF
A scalar UDF must implement the evaluate method as a public member method. The method signature determines the SQL parameter and return types — they must match the types you declare in the CREATE FUNCTION statement (see Data type mappings).
| Method | Description |
|---|---|
TYPE1 evaluate(TYPE2, ...) |
Invocation entry point. Must be a public member method. |
The example below implements MY_UDF_JSON_GET, which extracts a nested JSON value using a dotted path expression. It replaces the nested GET_JSON_STRING(GET_JSON_STRING(...)) pattern with a single call: MY_UDF_JSON_GET('{"key":"{\\"k0\\":\\"v0\\"}"}', "$.key.k0").
package com.starrocks.udf.sample;
import com.alibaba.fastjson.JSONPath;
public class UDFJsonGet {
public final String evaluate(String jsonObj, String key) {
if (jsonObj == null || key == null) return null;
try {
// JSONPath.read fully expands nested JSON strings
return JSONPath.read(jsonObj, key).toString();
} catch (Exception e) {
return null;
}
}
}
UDAF
A UDAF aggregates multiple rows per group into a single result. It uses a State inner class to hold intermediate results, which StarRocks serializes and deserializes when transmitting data between execution nodes.
Required methods — implement all six for every UDAF:
| Method | Required | Description |
|---|---|---|
State create() |
Always | Allocate a new State object. |
void destroy(State) |
Always | Release resources held by the State. |
void update(State, ...) |
Always | Accumulate one input row into the State. The first parameter is State; the remaining parameters are the declared function inputs. |
void serialize(State, ByteBuffer) |
Always | Write the State into the buffer for inter-node transmission. |
void merge(State, ByteBuffer) |
Always | Merge and deserialize a State from the buffer. |
TYPE finalize(State) |
Always | Extract the final aggregate result from the State. |
Intermediate state buffer — use java.nio.ByteBuffer to store intermediate results:
| Item | Description |
|---|---|
java.nio.ByteBuffer |
Holds the serialized State during inter-node transmission. |
serializeLength() |
Returns the byte length of the serialized State (data type: INT). Must exactly match the number of bytes you write in serialize. For an int counter, return 4; for a long counter, return 8. |
Do not call remaining() on the ByteBuffer to deserialize a State, and do not call clear() on it. If serializeLength does not match the bytes actually written in serialize, aggregation produces incorrect results.
The example below implements MY_SUM_INT, an INT-in/INT-out sum (unlike the built-in SUM, which always returns BIGINT):
package com.starrocks.udf.sample;
public class SumInt {
public static class State {
int counter = 0;
public int serializeLength() { return 4; } // INT = 4 bytes
}
public State create() {
return new State();
}
public void destroy(State state) {
}
public final void update(State state, Integer val) {
if (val != null) {
state.counter += val;
}
}
public void serialize(State state, java.nio.ByteBuffer buff) {
buff.putInt(state.counter);
}
public void merge(State state, java.nio.ByteBuffer buffer) {
int val = buffer.getInt();
state.counter += val;
}
public Integer finalize(State state) {
return state.counter;
}
}
UDWF
A UDWF (user-defined window function) is a special UDAF that returns one result per input row rather than one result per group. It uses an OVER clause to define the partition and window frame, and adds a windowUpdate method to the standard UDAF interface.
Implement all six UDAF methods plus windowUpdate:
| Method | Description |
|---|---|
void reset(State state) |
Reset the State when the window frame changes. |
void windowUpdate(State state, int peer_group_start, int peer_group_end, int frame_start, int frame_end, TYPE[] inputs) |
Update the State for the current row's window frame. |
`windowUpdate` parameters:
| Parameter | Description |
|---|---|
peer_group_start |
Start index of the current partition (rows sharing the same PARTITION BY key). |
peer_group_end |
End index of the current partition. |
frame_start |
Start index of the current window frame (e.g., ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING). |
frame_end |
End index of the current window frame. |
inputs |
Input column values for the window as a wrapper-class array. Use Integer[] for INT inputs. |
The example below implements MY_WINDOW_SUM_INT, an INT window sum:
package com.starrocks.udf.sample;
public class WindowSumInt {
public static class State {
int counter = 0;
public int serializeLength() { return 4; }
}
public State create() {
return new State();
}
public void destroy(State state) {
}
public void update(State state, Integer val) {
if (val != null) {
state.counter += val;
}
}
public void serialize(State state, java.nio.ByteBuffer buff) {
buff.putInt(state.counter);
}
public void merge(State state, java.nio.ByteBuffer buffer) {
int val = buffer.getInt();
state.counter += val;
}
public Integer finalize(State state) {
return state.counter;
}
public void reset(State state) {
state.counter = 0;
}
public void windowUpdate(State state,
int peer_group_start, int peer_group_end,
int frame_start, int frame_end,
Integer[] inputs) {
for (int i = (int)frame_start; i < (int)frame_end; ++i) {
state.counter += inputs[i];
}
}
}
For more information about window function syntax, see Window functions.
UDTF
A UDTF reads one input row and returns multiple rows, all in a single column. It must implement the process method, which returns an array.
UDTFs support returning multiple rows in a single column only.
| Method | Description |
|---|---|
TYPE[] process() |
Invocation entry point. Returns an array — each element becomes a separate output row. |
The example below implements MY_UDF_SPLIT, which splits a string on spaces:
package com.starrocks.udf.sample;
public class UDFSplit{
public String[] process(String in) {
if (in == null) return null;
return in.split(" ");
}
}
Step 4: Package the project
Run the following command to build the JAR:
mvn package
This generates two files in the target directory:
-
udf-1.0-SNAPSHOT.jar -
udf-1.0-SNAPSHOT-jar-with-dependencies.jar
Step 5: Upload the JAR to OSS
Upload udf-1.0-SNAPSHOT-jar-with-dependencies.jar to an Object Storage Service (OSS) bucket and set the bucket ACL to allow public reads. For upload instructions, see Simple upload and Bucket ACLs.
The frontend (FE) node verifies the JAR and computes its checksum. The backend (BE) node downloads and executes the JAR. The file property in Step 6 must use the OSS internal endpoint URL.
Step 6: Register the UDF in StarRocks
StarRocks supports two UDF namespaces: global and database-level.
-
Global UDF: Callable by name from any database without a
catalog.databaseprefix. Use this for shared utility functions. -
Database-level UDF: Callable by name within its own database. From a different database, use the
catalog.database.function_nameformat. Use this when you need the same function name in multiple databases.
Required permissions:
-
Create a global UDF: system-level
CREATE GLOBAL FUNCTIONpermission -
Create a database-level UDF: database-level
CREATE FUNCTIONpermissionGRANT -
Call a UDF:
USAGEpermission on the UDF
For permission setup, see GRANT.
Syntax
CREATE [GLOBAL] [AGGREGATE | TABLE] FUNCTION function_name(arg_type [, ...])
RETURNS return_type
[PROPERTIES ("key" = "value" [, ...]) ]
Parameters
| Parameter | Required | Description |
|---|---|---|
GLOBAL |
No | Creates a global UDF. Supported in StarRocks 3.0 and later. |
AGGREGATE |
No | Required for UDAFs and UDWFs. |
TABLE |
No | Required for UDTFs. |
function_name |
Yes | The function name. Include a database name to create the UDF in a specific database (e.g., db1.my_func). A function with the same name and identical parameter types cannot be created twice in the same database; different parameter types are allowed. |
arg_type |
Yes | Parameter type(s). See Data type mappings. |
return_type |
Yes | Return type. See Data type mappings. |
PROPERTIES |
Yes | Function properties. See the sub-sections below. |
PROPERTIES parameters
| Property | Required | Description |
|---|---|---|
symbol |
Yes | Fully qualified class name in <package_name>.<class_name> format. |
type |
Yes | Set to StarrocksJar for Java-based UDFs. |
file |
Yes | HTTP URL of the JAR using the OSS internal endpoint: http://<YourBucketName>.oss-cn-xxxx-internal.aliyuncs.com/<YourPath>/<jar_package_name> |
analytic |
No | Set to true for UDWFs. Not required for other UDF types. |
Create a scalar UDF
CREATE [GLOBAL] FUNCTION MY_UDF_JSON_GET(string, string)
RETURNS string
PROPERTIES (
"symbol" = "com.starrocks.udf.sample.UDFJsonGet",
"type" = "StarrocksJar",
"file" = "http://<YourBucketName>.oss-cn-xxxx-internal.aliyuncs.com/<YourPath>/udf-1.0-SNAPSHOT-jar-with-dependencies.jar"
);
Create a UDAF
CREATE [GLOBAL] AGGREGATE FUNCTION MY_SUM_INT(INT)
RETURNS INT
PROPERTIES (
"symbol" = "com.starrocks.udf.sample.SumInt",
"type" = "StarrocksJar",
"file" = "http://<YourBucketName>.oss-cn-xxxx-internal.aliyuncs.com/<YourPath>/udf-1.0-SNAPSHOT-jar-with-dependencies.jar"
);
Create a UDWF
CREATE [GLOBAL] AGGREGATE FUNCTION MY_WINDOW_SUM_INT(Int)
RETURNS Int
PROPERTIES (
"analytic" = "true",
"symbol" = "com.starrocks.udf.sample.WindowSumInt",
"type" = "StarrocksJar",
"file" = "http://<YourBucketName>.oss-cn-xxxx-internal.aliyuncs.com/<YourPath>/udf-1.0-SNAPSHOT-jar-with-dependencies.jar"
);
Create a UDTF
CREATE [GLOBAL] TABLE FUNCTION MY_UDF_SPLIT(string)
RETURNS string
PROPERTIES (
"symbol" = "com.starrocks.udf.sample.UDFSplit",
"type" = "StarrocksJar",
"file" = "http://<YourBucketName>.oss-cn-xxxx-internal.aliyuncs.com/<YourPath>/udf-1.0-SNAPSHOT-jar-with-dependencies.jar"
);
Step 7: Call the UDF
Scalar UDF
SELECT MY_UDF_JSON_GET('{"key":"{\\"in\\":2}"}', '$.key.in');
UDAF
SELECT MY_SUM_INT(col1);
UDWF
SELECT MY_WINDOW_SUM_INT(intcol)
OVER (PARTITION BY intcol2
ORDER BY intcol3
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM test_basic;
UDTF
-- Assume table t1 has columns a, b, and c1
SELECT t1.a, t1.b, t1.c1 FROM t1;
-- Output:
-- 1, 2.1, "hello world"
-- 2, 2.2, "hello UDTF."
-- Split c1 into one word per row
SELECT t1.a, t1.b, MY_UDF_SPLIT FROM t1, MY_UDF_SPLIT(t1.c1);
-- Output:
-- 1, 2.1, "hello"
-- 1, 2.1, "world"
-- 2, 2.2, "hello"
-- 2, 2.2, "UDTF."
MY_UDF_SPLIT in the SELECT list is the column alias generated when you call the function. You cannot use AS t2(f1) to assign a table alias or column alias to a UDTF result.
View UDFs
SHOW [GLOBAL] FUNCTIONS;
Delete a UDF
DROP [GLOBAL] FUNCTION <function_name>(arg_type [, ...]);
FAQ
Can I use static variables in a UDF? Do static variables from different UDFs affect each other?
Yes. Static variables are isolated per UDF class — they do not interfere with static variables from other UDF classes, even if two classes share the same name.