This topic describes how to manage and use user-defined functions (UDFs) in the serverless Spark engine of Data Lake Analytics (DLA).
Manage UDFs
Create a UDF
The metadata service of the serverless Spark engine allows you to create UDFs based on Hive 1.2.1. Syntax:
CREATE FUNCTION function_name AS class_name USING resource_location_list;
Parameter
Description
function_name
The name of the UDF that you want to create. Before you create the UDF, execute the
USE DatabaseName
statement to specify the database for which the UDF is called. You can also explicitly specify the database.class_name
The name of the UDF class. A complete class name must include the package information.
resource_location_list
The directories where the JAR packages or files required for creating a UDF are saved. You must use this parameter to explicitly specify the required JAR packages and uniform resource identifiers (URIs), for example,
USING (JAR 'oss://test/function.jar',FILE 'oss://test/model.csv'
.
Query all UDFs of a database
USE databasename; SHOW USER FUNCTIONS;
NoteIf the
USER
keyword is not added, the default UDFs of the serverless Spark engine are returned. The default UDFs cannot be dropped.
Drop a UDF
USE databasename; DROP FUNCTION functionname;
NoteThe metadata service of DLA does not support ALTER statements for UDFs. If you want to modify metadata configurations, drop the UDFs and create them again.
Use UDFs
Develop a UDF.
Initialize a Maven project and add the following dependency to the pom.xml file.
<dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> <version>1.2.1</version> </dependency>
Implement the
Ten.java
file in theorg.test.udf
package. The value plus 10 is returned.package org.test.udf; import org.apache.hadoop.hive.ql.exec.UDF; public class Ten extends UDF { public long evaluate(long value) { return value + 10; } }
Create the UDF in the serverless Spark engine and use this UDF.
Compile the code of the Maven project, package the code into a
udf.jar
file, and then upload the file to OSS. After that, you can create the UDF and execute SQL statements in the serverless Spark engine to access the UDF.-- here is the spark conf set spark.driver.resourceSpec=medium; set spark.executor.instances=5; set spark.executor.resourceSpec=medium; set spark.app.name=sparksqltest; set spark.sql.hive.metastore.version=dla; set spark.dla.connectors=oss; -- here is your sql statement use db; CREATE FUNCTION addten as 'com.aliyun.dla.udf.Ten' USING JAR 'oss://path/to/your/udf.jar'; select addten(7);
Check the result.
After you create the UDF, you can call this UDF directly.
The
addten
UDF has been created in thedb
database. After a Spark job that calls the UDF is run, you can view value17
in log data.