All Products
Search
Document Center

E-MapReduce:Use UDFs

Last Updated:Mar 26, 2026

When Spark SQL's built-in functions don't cover your logic, create a user-defined function (UDF) and register it in Spark SQL. This topic walks you through registering and calling Python and Java/Scala UDFs, using prebuilt sample files so you can test end-to-end without writing code from scratch.

Supported versions

The following E-MapReduce (EMR) Serverless Spark runtime versions support the examples in this topic:

  • esr-5.x: esr-5.0.0 and later

  • esr-4.x: esr-4.6.0 and later

  • esr-3.x: esr-3.5.0 and later

  • esr-2.x: esr-2.9.0 and later

Prerequisites

Before you begin, ensure that you have:

Python UDF

This example uses a main function, my_adds.my_add, that imports two dependency packages — module1 and module2. module1 adds 0.5 to the input integer and module2 adds 0.3. The function combines both results and returns a floating-point number.

Step 1: Upload files to OSS

Download the three sample files and upload them to your OSS bucket.

  1. Download the example files:

    • module1.tgz: Adds 0.5 to an integer and returns a floating-point number.

    • module2.tgz: Adds 0.3 to an integer and returns a floating-point number.

    • my_adds.py: The main function file. Calls PythonUDF.your_add(a) and PythonUDF2.add2(a), adds the results, and returns the final value.

  2. Upload all three files to your OSS bucket. For more information, see Simple upload.

If your UDF references custom packages, extract them into the Spark Python environment via sys.path. The my_adds.py file demonstrates this pattern:
### my_adds.py content
import sys

def my_add(a: int) -> float:
    # Make sure the dependency path is added to sys.path
    if not sys.path.__contains__("./module1.tgz"):
        sys.path.insert(0, "./module1.tgz/")
        from module1 import PythonUDF
    if not sys.path.__contains__("./module2.tgz"):
        sys.path.insert(0, "./module2.tgz/")
        from module2 import PythonUDF2

    b = PythonUDF.your_add(a) + PythonUDF2.add2(a)
    return b

Step 2: Register the UDF

Register the function in Spark SQL using CREATE FUNCTION. Choose permanent registration for production environments or temporary registration for development and testing.

  1. In your workspace, create a Spark SQL task.

  2. On the Spark SQL tab, run one of the following statements. Replace <bucket> with your OSS bucket name. Permanent function — saved to the data catalog and reusable across all SQL sessions:

    -- The format after AS is "[python_file_name].[function_name]"
    -- Use a public-read or authorized OSS bucket.
    CREATE OR REPLACE FUNCTION adds AS "my_adds.my_add"
    USING FILE "oss://<bucket>/demo/udf/my_adds.py",
          FILE "oss://<bucket>/demo/udf/module1.tgz",
          FILE "oss://<bucket>/demo/udf/module2.tgz";

    Temporary function — valid only for the current SQL session:

    -- Use the TEMPORARY keyword to create a temporary function.
    CREATE TEMPORARY FUNCTION adds AS "my_adds.my_add"
    USING FILE "oss://<bucket>/demo/udf/my_adds.py",
          FILE "oss://<bucket>/demo/udf/module1.tgz",
          FILE "oss://<bucket>/demo/udf/module2.tgz";

Step 3: Call the UDF

Run the following SQL statements on the Spark SQL tab to test the function.

-- Prepare test data
CREATE TABLE IF NOT EXISTS test_tbl (id INT, name STRING);
TRUNCATE TABLE test_tbl;
INSERT INTO test_tbl VALUES (1, 'Alice'), (2, 'Bob'), (3, 'Charlie'), (4, 'David');

-- Call the registered Python UDF
SELECT id, adds(id) AS result FROM test_tbl;

The expected output is similar to:

image

Java/Scala UDF

This example uses a precompiled Java UDF JAR file. The function takes a string, appends ":HelloWorld", and returns the result. The process for Scala UDFs is the same.

Step 1: Upload the JAR file to OSS

  1. Download the example JAR file: udf-1.0-SNAPSHOT.jar: Appends ":HelloWorld" to an input string and returns the result.

  2. Upload the file to your OSS bucket. For more information, see Simple upload.

Step 2: Register the UDF

Register the function using CREATE FUNCTION with USING JAR to point to the OSS path of the JAR file.

  1. In your workspace, create a Spark SQL task.

  2. On the Spark SQL tab, run one of the following statements. Replace the OSS path with the actual location of your JAR file. Permanent function — saved to the data catalog (DLF, DLF 1.0, or HMS) and reusable across all SQL sessions:

    -- The class created in the UDF follows AS.
    CREATE FUNCTION myfunc AS "org.example.MyUDF"
    USING JAR "oss://path/to/udf-1.0-SNAPSHOT.jar";

    Temporary function — valid only for the current SQL session:

    -- The temporary function is valid only in the current session.
    CREATE TEMPORARY FUNCTION myfunc AS "org.example.MyUDF"
    USING JAR "oss://path/to/udf-1.0-SNAPSHOT.jar";

Step 3: Call the UDF

In the task editor, run the following statement to test the function:

SELECT myfunc("abc");

The expected output is similar to:

image