All Products
Search
Document Center

E-MapReduce:User-defined functions (UDFs)

Last Updated:Mar 26, 2026

Apache Hive provides many built-in functions for data processing. When built-in functions don't cover your logic—such as custom string transformations, data encryption, or domain-specific calculations—write a user-defined function (UDF) instead.

UDF types

Apache Hive supports three types of UDFs:

TypeFull nameBehavior
UDFUser-defined scalar functionOne-to-one mapping: reads one row, returns one value
UDTFUser-defined table-valued functionReturns multiple rows per input; the only type that can return multiple fields
UDAFUser-defined aggregate functionMany-to-one mapping: aggregates multiple rows into a single output value; used with GROUP BY

Prerequisites

Before you begin, make sure you have:

  • An E-MapReduce (EMR) cluster with SSH access. See Log on to a cluster

  • Java Development Kit (JDK) installed

  • Apache Maven installed and configured

  • An integrated development environment (IDE) for Java development

Develop UDF code

This section walks through creating a simple UDF that appends :HelloWorld to a string input.

1. Create a Maven project

In your IDE, create a new Maven project with the following coordinates. Adjust groupId and artifactId to match your organization and project name.

<groupId>org.example</groupId>
<artifactId>hiveudf</artifactId>
<version>1.0-SNAPSHOT</version>

2. Add the Hive dependency

Add the following dependency to your pom.xml:

<dependency>
    <groupId>org.apache.hive</groupId>
    <artifactId>hive-exec</artifactId>
    <version>2.3.7</version>
    <exclusions>
        <exclusion>
            <groupId>org.pentaho</groupId>
            <artifactId>*</artifactId>
        </exclusion>
    </exclusions>
</dependency>

3. Implement the UDF class

Create a class that extends org.apache.hadoop.hive.ql.exec.UDF and implement the evaluate() method. The class name can be anything; this example uses MyUDF.

package org.example;

import org.apache.hadoop.hive.ql.exec.UDF;

public class MyUDF extends UDF {
    public String evaluate(final String s) {
        if (s == null) { return null; }
        return s + ":HelloWorld";
    }
}

4. Build the JAR file

In the directory containing pom.xml, run:

mvn clean package -DskipTests

The output JAR file hiveudf-1.0-SNAPSHOT.jar appears in the target directory.

Deploy and register the UDF

1. Transfer the JAR to your cluster

Use SSH Secure File Transfer Client to upload hiveudf-1.0-SNAPSHOT.jar to the root directory of your EMR cluster.

2. Upload the JAR to HDFS

  1. Log on to your cluster in SSH mode.

  2. Upload the JAR to Hadoop Distributed File System (HDFS):

    hadoop fs -put hiveudf-1.0-SNAPSHOT.jar /user/hive/warehouse/
  3. Verify the upload succeeded:

    hadoop fs -ls /user/hive/warehouse/

    Expected output:

    Found 1 items
    -rw-r--r--   1 xx xx 2668 2021-06-09 14:13 /user/hive/warehouse/hiveudf-1.0-SNAPSHOT.jar

3. Register the UDF in Hive

Open the Hive CLI:

hive

Register the UDF as a permanent function:

create function myfunc as "org.example.MyUDF" using jar "hdfs:///user/hive/warehouse/hiveudf-1.0-SNAPSHOT.jar";

In this command:

ParameterDescription
myfuncThe function name used in queries
org.example.MyUDFThe fully qualified class name from the JAR
hdfs:///user/hive/warehouse/hiveudf-1.0-SNAPSHOT.jarThe HDFS path to the JAR

If registration succeeds, the output is similar to:

Added [/private/var/folders/2s/wzzsgpn13rn8rl_0fc4xxkc00000gp/T/40608d4a-a0e1-4bf5-92e8-b875fa6a1e53_resources/hiveudf-1.0-SNAPSHOT.jar] to class path
Added resources: [hdfs:///user/hive/warehouse/myfunc/hiveudf-1.0-SNAPSHOT.jar]

4. Test the UDF

Call the UDF in a query the same way as any built-in function:

select myfunc("abc");

Expected output:

OK
abc:HelloWorld

To confirm the function is registered, run:

SHOW FUNCTIONS LIKE '*myfunc*';

Troubleshooting

The `create function` command fails with a class not found error

Make sure the class name in the create function statement matches the fully qualified class name in your JAR, including the package prefix. For example, if your class is MyUDF in package org.example, the reference must be org.example.MyUDF, not MyUDF.

The UDF returns `null` for all inputs

Check the evaluate() method's null-handling logic. The example implementation returns null when the input is null. If your function should handle null differently, update the method accordingly.

References