Hive provides many built-in functions. If the built-in functions cannot meet your computing requirements, you can create user-defined functions (UDFs). You can use UDFs in a similar way to common built-in functions. This topic describes how to develop and use UDFs.
Background information
The following table describes the types of UDFs.
UDF type | Description |
---|---|
User-defined scalar function (UDF) | The input and output of a UDF have a one-to-one mapping. Each time a UDF is called, it reads a single row of data and returns a single data value. |
User-defined table-valued function (UDTF) | Each time a UDTF is called, it returns multiple rows of data. Among all types of UDFs, only UDTFs can return multiple fields. |
User-defined aggregate function (UDAF) | The input and output of a UDAF have a many-to-one mapping. Each time a UDAF is called, it aggregates multiple input records into one output value. You can use a UDAF with the GROUP BY clause in an SQL statement. |
Develop UDF code
- Use an integrated development environment (IDE) to create a Maven project. The following code shows the basic information of the project. You can configure the groupId and artifactId parameters based on your business requirements.
<groupId>org.example</groupId> <artifactId>hiveudf</artifactId> <version>1.0-SNAPSHOT</version>
- Add the following dependency to the pom.xml file:
<dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> <version>2.3.7</version> <exclusions> <exclusion> <groupId>org.pentaho</groupId> <artifactId>*</artifactId> </exclusion> </exclusions> </dependency>
- Create a class that inherits from the Hive UDF class. You can specify the class name. In this example, the class name is MyUDF.
package org.example; import org.apache.hadoop.hive.ql.exec.UDF; /** * Hello world! * */ public class MyUDF extends UDF { public String evaluate(final String s) { if (s == null) { return null; } return s + ":HelloWorld"; } }
- Package the custom code into a JAR file. In the directory where the pom.xml file is stored, run the following command to create a JAR file:
mvn clean package -DskipTests
The hiveudf-1.0-SNAPSHOT.jar file appears in the target directory. The code development is complete.
Create and use a UDF
- Use SSH Secure File Transfer Client to upload the generated JAR file to the root directory of your E-MapReduce (EMR) cluster. Make sure that SSH Secure File Transfer Client is installed on your computer before you perform this step.
- Upload the JAR file to Hadoop Distributed File System (HDFS).
- Create a UDF.
- Run the following command to use the UDF. You can specify the name of the UDF in the command to use the UDF in the same way as you use a built-in function.
select myfunc("abc");
The following information is returned:OK abc:HelloWorld