All Products
Search
Document Center

E-MapReduce:Use UDFs

Last Updated:Dec 01, 2025

If the built-in functions in Spark SQL do not meet your needs, you can create user-defined functions (UDFs) to extend Spark's capabilities. This topic guides you through the process for creating and using Python and Java/Scala UDFs.

Supported versions

Only the following database engine versions support the examples in this topic:

  • esr-5.x: esr-5.0.0 and later.

  • esr-4.x: esr-4.6.0 and later.

  • esr-3.x: esr-3.5.0 and later.

  • esr-2.x: esr-2.9.0 and later.

Python UDF

This example shows a main function, my_adds.my_add. It imports two separate Python modules, module1 and module2, which add 0.5 and 0.3 to the input, respectively. The function then combines and returns the results.

Step 1: Download and upload files

For quick testing, the required code files are provided. Follow these steps to download and upload the files.

  1. Download the example files. Click the following links to download the three files required for testing:

    • module1.tgz: Adds 0.5 to an integer and returns a floating-point number.

    • module2.tgz: Adds 0.3 to an integer and returns a floating-point number.

    • my_adds.py: The main function file. It calls PythonUDF.your_add(a) and PythonUDF2.add2(a), adds the results, and returns the final value.

  2. Upload the files. Upload the following files to an Object Storage Service (OSS) bucket that you are authorized to access. For more information, see Simple upload.

    • Dependency package: module1.tgz

    • Dependency package: module2.tgz

    • Main logic file: my_adds.py

Note

If your UDF references custom packages, you must extract the dependency packages to the Spark Python environment. The following code from my_adds.py is an example:

### my_adds.py content
import sys

def my_add(a: int) -> float:
    # Make sure the dependency path is added to sys.path
    if not sys.path.__contains__("./module1.tgz"):
        sys.path.insert(0, "./module1.tgz/")
        from module1 import PythonUDF
    if not sys.path.__contains__("./module2.tgz"):
        sys.path.insert(0, "./module2.tgz/")
        from module2 import PythonUDF2

    b = PythonUDF.your_add(a) + PythonUDF2.add2(a)
    return b

Step 2: Register the UDF

Register the function in Spark SQL so that you can call it in the same way as a built-in function. You can register it as a permanent or temporary function, as needed.

  1. Go to your workspace and create a SparkSQL task. For more information, see Getting started with SparkSQL development.

  2. On the new Spark SQL tab, use the CREATE FUNCTION statement to register the UDF.

    • Register as a permanent function: The function's information is saved to the data catalog, which allows it to be reused across all SQL sessions. This method is recommended for production environments because it simplifies sharing and management.

      -- The format after AS is "[python_file_name].[function_name]"
      -- Use a public-read or authorized OSS bucket.
      CREATE OR REPLACE FUNCTION adds AS "my_adds.my_add"
      USING FILE "oss://<bucket>/demo/udf/my_adds.py",
            FILE "oss://<bucket>/demo/udf/module1.tgz",
            FILE "oss://<bucket>/demo/udf/module2.tgz";
    • Register as a temporary function: The function is valid only for the current SQL session. This method is recommended for development and testing.

      -- Use the TEMPORARY keyword to create a temporary function.
      CREATE TEMPORARY FUNCTION adds AS "my_adds.my_add"
      USING FILE "oss://<bucket>/demo/udf/my_adds.py",
            FILE "oss://<bucket>/demo/udf/module1.tgz",
            FILE "oss://<bucket>/demo/udf/module2.tgz";

Step 3: Use the UDF

After the function is registered, you can call it in SQL statements.

  1. On the Spark SQL tab, run the following SQL statements to test the function.

    -- Prepare test data
    CREATE TABLE IF NOT EXISTS test_tbl (id INT, name STRING);
    TRUNCATE TABLE test_tbl;
    INSERT INTO test_tbl VALUES (1, 'Alice'), (2, 'Bob'), (3, 'Charlie'), (4, 'David');
    
    -- Call the registered Python UDF
    SELECT id, adds(id) AS result FROM test_tbl;
  2. View the results.image

Java/Scala UDF

This example uses a precompiled Java UDF sample JAR package. You can register and call the function without writing code or building a project. The process for Scala UDFs is similar.

Step 1: Download and upload the file

For quick testing, a sample JAR package is provided. Follow these steps to download and upload the file.

  1. Download the example file. Click the following link to download the file required for testing:

    udf-1.0-SNAPSHOT.jar: Appends ":HelloWorld" to an input string and returns the result.

  2. Upload the file. Upload the downloaded file to an OSS bucket that you are authorized to access. For more information, see Simple upload.

Step 2: Register the UDF

In Spark SQL, you must register a UDF before you can call it. You can register it as a permanent function or a temporary function, depending on your scenario.

  1. Go to your workspace and create a SparkSQL task. For more information, see Getting started with SparkSQL development.

  2. On the new Spark SQL tab, use the CREATE FUNCTION statement to register the UDF. Use USING JAR to specify the OSS path to the JAR file.

    • Register as a permanent function: The function's information is saved to the data catalog, which allows it to be reused across all SQL sessions. This method is recommended for production environments because it simplifies sharing and management.

      # Register in DLF, DLF 1.0, or HMS.
      # The class created in the UDF follows AS.
      CREATE FUNCTION myfunc AS "org.example.MyUDF" 
      USING jar "oss://path/to/udf-1.0-SNAPSHOT.jar";
    • Register as a temporary function: The function is valid only for the current SQL session. This method is recommended for development and testing.

      # The temporary function is valid only in the current session.
      CREATE TEMPORARY FUNCTION myfunc AS "org.example.MyUDF" 
      USING jar "oss://path/to/udf-1.0-SNAPSHOT.jar";

Step 3: Use the UDF

After the function is registered, you can call it in SQL statements.

  1. In the task editor, run the following SQL statement to test the function.

    SELECT myfunc("abc");
  2. View the results.image