All Products
Document Center

Spark UDFs

Last Updated: Apr 22, 2021

This topic describes how to manage and use user-defined functions (UDFs) in the serverless Spark engine of Data Lake Analytics (DLA).

Manage UDFs

  • Create a UDF

    The metadata service of the serverless Spark engine allows you to create UDFs based on Hive 1.2.1. Syntax:

    CREATE FUNCTION function_name AS class_name USING resource_location_list;

    Before you create a UDF, configure a task to access the metadata service of DLA. For more information, see Spark SQL.




    The name of the UDF that you want to create. Before you create the UDF, execute the USE DatabaseName statement to specify the database for which the UDF is called. You can also explicitly specify the database.


    The name of the UDF class. A complete class name must include the package information.


    The directories where the JAR packages or files required for creating a UDF are saved. You must use this parameter to explicitly specify the required JAR packages and uniform resource identifiers (URIs), for example, USING (JAR 'oss://test/function.jar',FILE 'oss://test/model.csv'.

  • Query all UDFs of a database

    USE databasename;

    If the USER keyword is not added, the default UDFs of the serverless Spark engine are returned. The default UDFs cannot be dropped.

  • Drop a UDF

    USE databasename;
    DROP FUNCTION functionname;

    The metadata service of DLA does not support ALTER statements for UDFs. If you want to modify metadata configurations, drop the UDFs and create them again.

Use a UDF

  1. Develop a UDF.

    Initialize a Maven project and add the following dependency to the pom.xml file.


    Implement the file in the org.test.udf package. Values plus 10 are returned.

    package org.test.udf;
    import org.apache.hadoop.hive.ql.exec.UDF;
    public class Ten extends UDF {
      public long evaluate(long value) {
        return value + 10;
  2. Register the UDF with the serverless Spark engine.

    Compile the code of the Maven project, package the code into a udf.jar file, and then upload the file to OSS. After that, you can register the UDF with the serverless Spark engine and access the UDF by executing SQL statements.

    from pyspark.sql import SparkSession
    if __name__ == "__main__":
      spark = SparkSession.builder.appName("UDF Example").getOrCreate()
      # create database
      spark.sql("CREATE DATABASE db LOCATION 'oss://test/db'")
      # set database active
      spark.sql("use db")
      # register the udf
      UDFspark.sql("CREATE FUNCTION addten as 'org.test.udf.Ten' USING JAR 'oss://test/udf.jar'")
      # check the udf register success 
      spark.sql("SHOW USER FUNCTIONS").show()
      # create a new table
      spark.sql("CREATE TABLE tb(col1 int, col2 string) USING parquet ""LOCATION 'oss://test/db/tb'")
      spark.sql("INSERT INTO tb VALUES(1,'a'),(2, 'b')")
      # try to use this udf
      spark.sql("SELECT addten(col1), * FROM db.tb").show()spark.stop()
  3. Call the UDF in Spark SQL.

    After you register the UDF with the serverless Spark engine, you can call this UDF directly from the metadata service without repeated registration.

          "use db",
          "show user functions",
          "select addten(7)"
       "name":"UDF Example",

    The addten UDF has been registered with the db database. After the preceding Spark job is complete, you can view value 17 in log data.