If the built-in functions in Spark SQL do not meet your needs, you can create user-defined functions (UDFs) to extend Spark's capabilities. This topic guides you through the process for creating and using Python and Java/Scala UDFs.
Supported versions
Only the following database engine versions support the examples in this topic:
esr-5.x: esr-5.0.0 and later.
esr-4.x: esr-4.6.0 and later.
esr-3.x: esr-3.5.0 and later.
esr-2.x: esr-2.9.0 and later.
Python UDF
This example shows a main function, my_adds.my_add. It imports two separate Python modules, module1 and module2, which add 0.5 and 0.3 to the input, respectively. The function then combines and returns the results.
Step 1: Download and upload files
For quick testing, the required code files are provided. Follow these steps to download and upload the files.
Download the example files. Click the following links to download the three files required for testing:
module1.tgz: Adds
0.5to an integer and returns a floating-point number.module2.tgz: Adds
0.3to an integer and returns a floating-point number.my_adds.py: The main function file. It calls
PythonUDF.your_add(a)andPythonUDF2.add2(a), adds the results, and returns the final value.
Upload the files. Upload the following files to an Object Storage Service (OSS) bucket that you are authorized to access. For more information, see Simple upload.
Dependency package:
module1.tgzDependency package:
module2.tgzMain logic file:
my_adds.py
If your UDF references custom packages, you must extract the dependency packages to the Spark Python environment. The following code from my_adds.py is an example:
### my_adds.py content
import sys
def my_add(a: int) -> float:
# Make sure the dependency path is added to sys.path
if not sys.path.__contains__("./module1.tgz"):
sys.path.insert(0, "./module1.tgz/")
from module1 import PythonUDF
if not sys.path.__contains__("./module2.tgz"):
sys.path.insert(0, "./module2.tgz/")
from module2 import PythonUDF2
b = PythonUDF.your_add(a) + PythonUDF2.add2(a)
return bStep 2: Register the UDF
Register the function in Spark SQL so that you can call it in the same way as a built-in function. You can register it as a permanent or temporary function, as needed.
Go to your workspace and create a SparkSQL task. For more information, see Getting started with SparkSQL development.
On the new Spark SQL tab, use the
CREATE FUNCTIONstatement to register the UDF.Register as a permanent function: The function's information is saved to the data catalog, which allows it to be reused across all SQL sessions. This method is recommended for production environments because it simplifies sharing and management.
-- The format after AS is "[python_file_name].[function_name]" -- Use a public-read or authorized OSS bucket. CREATE OR REPLACE FUNCTION adds AS "my_adds.my_add" USING FILE "oss://<bucket>/demo/udf/my_adds.py", FILE "oss://<bucket>/demo/udf/module1.tgz", FILE "oss://<bucket>/demo/udf/module2.tgz";Register as a temporary function: The function is valid only for the current SQL session. This method is recommended for development and testing.
-- Use the TEMPORARY keyword to create a temporary function. CREATE TEMPORARY FUNCTION adds AS "my_adds.my_add" USING FILE "oss://<bucket>/demo/udf/my_adds.py", FILE "oss://<bucket>/demo/udf/module1.tgz", FILE "oss://<bucket>/demo/udf/module2.tgz";
Step 3: Use the UDF
After the function is registered, you can call it in SQL statements.
On the Spark SQL tab, run the following SQL statements to test the function.
-- Prepare test data CREATE TABLE IF NOT EXISTS test_tbl (id INT, name STRING); TRUNCATE TABLE test_tbl; INSERT INTO test_tbl VALUES (1, 'Alice'), (2, 'Bob'), (3, 'Charlie'), (4, 'David'); -- Call the registered Python UDF SELECT id, adds(id) AS result FROM test_tbl;View the results.

Java/Scala UDF
This example uses a precompiled Java UDF sample JAR package. You can register and call the function without writing code or building a project. The process for Scala UDFs is similar.
Step 1: Download and upload the file
For quick testing, a sample JAR package is provided. Follow these steps to download and upload the file.
Download the example file. Click the following link to download the file required for testing:
udf-1.0-SNAPSHOT.jar: Appends
":HelloWorld"to an input string and returns the result.Upload the file. Upload the downloaded file to an OSS bucket that you are authorized to access. For more information, see Simple upload.
Step 2: Register the UDF
In Spark SQL, you must register a UDF before you can call it. You can register it as a permanent function or a temporary function, depending on your scenario.
Go to your workspace and create a SparkSQL task. For more information, see Getting started with SparkSQL development.
On the new Spark SQL tab, use the
CREATE FUNCTIONstatement to register the UDF. UseUSING JARto specify the OSS path to the JAR file.Register as a permanent function: The function's information is saved to the data catalog, which allows it to be reused across all SQL sessions. This method is recommended for production environments because it simplifies sharing and management.
# Register in DLF, DLF 1.0, or HMS. # The class created in the UDF follows AS. CREATE FUNCTION myfunc AS "org.example.MyUDF" USING jar "oss://path/to/udf-1.0-SNAPSHOT.jar";Register as a temporary function: The function is valid only for the current SQL session. This method is recommended for development and testing.
# The temporary function is valid only in the current session. CREATE TEMPORARY FUNCTION myfunc AS "org.example.MyUDF" USING jar "oss://path/to/udf-1.0-SNAPSHOT.jar";
Step 3: Use the UDF
After the function is registered, you can call it in SQL statements.
In the task editor, run the following SQL statement to test the function.
SELECT myfunc("abc");View the results.
