This topic describes how to manage and use user-defined functions (UDFs) in the serverless Spark engine of Data Lake Analytics (DLA).
DLA is discontinued. AnalyticDB for MySQL Data Lakehouse Edition supports the features of DLA and provides more features and better performance. For more information about Spark SQL in AnalyticDB for MySQL, see Overview.
Manage UDFs by using the metadata service
Create a UDF
The metadata service of the serverless Spark engine allows you to create UDFs based on Hive 1.2.1. The following sample code shows the syntax:
CREATE FUNCTION function_name AS class_name USING resource_location_list;
Parameter
Description
function_name
The name of the UDF that you want to create. Before you create the UDF, execute the
USE DatabaseName
statement to specify the database in which the UDF is created. You can also explicitly specify the database.class_name
The name of the UDF class. A complete class name must include the package information. For information about the format of a class name, see the development standards of Spark and Hive built-in functions.
resource_location_list
The directories in which the JAR packages or files required for creating the UDF are stored. You must use this parameter to explicitly specify the required JAR packages and files. Example:
USING JAR 'oss://test/function.jar',FILE 'oss://test/model.csv'
.Query all UDFs of a database
USE databasename; SHOW USER FUNCTIONS;
NoteIf the
USER
keyword is not added, the default functions of the serverless Spark engine are returned. The default functions cannot be dropped.Drop a UDF
USE databasename; DROP FUNCTION functionname;
NoteThe metadata service of DLA does not support ALTER statements for UDFs. If you want to modify metadata configurations, drop the UDFs and create them again.
Use UDFs
Develop a UDF.
Initialize a Maven project and add the following dependency to the pom.xml file:
<dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> <version>1.2.1</version> </dependency>
Implement a file in the org.test.udf package. In this example, the
Ten.java
file is implemented in theorg.test.udf
package. This file returns the result of the input value plus 10.package org.test.udf; import org.apache.hadoop.hive.ql.exec.UDF; public class Ten extends UDF { public long evaluate(long value) { return value + 10; } }
Create the UDF in the serverless Spark engine and use this UDF.
Compile the code of the Maven project, package the code into a
udf.jar
file, and then upload the file to Object Storage Service (OSS). After the file is uploaded, you can create the UDF and execute SQL statements in the serverless Spark engine to access the UDF.-- here is the spark conf set spark.driver.resourceSpec=medium; set spark.executor.instances=5; set spark.executor.resourceSpec=medium; set spark.app.name=sparksqltest; set spark.sql.hive.metastore.version=dla; set spark.dla.connectors=oss; -- here is your sql statement use db; CREATE FUNCTION addten as 'com.aliyun.dla.udf.Ten' USING JAR 'oss://path/to/your/udf.jar'; select addten(7);
Check the result.
After the UDF is created, the UDF is registered with the serverless Spark engine. You can directly invoke this UDF by using the metadata service without repeated registration.
In this example, the
addten
UDF is created in thedb
database. After a Spark job that invokes the UDF is run, you can view the result17
in the logs of the job.