All Products
Search
Document Center

E-MapReduce:Basic operations on Hive UDFs in Spark SQL

Last Updated:Sep 11, 2023

Spark SQL provides multiple built-in functions to meet your computing requirements. You can also create user-defined functions (UDFs) to meet different computing requirements. You can use UDFs in a similar way as common built-in functions. This topic describes how to use Hive UDFs in Spark SQL.

Prerequisites

A UDF is created in Hive. For more information, see Develop UDF code.

Use Hive UDFs

  1. Use a file transfer tool to upload a generated JAR package to a directory in your cluster. In this example, the test directory is used.

  2. Upload the JAR package to Hadoop Distributed File System (HDFS) or Object Storage Service (OSS). In this example, HDFS is used.

    1. Log on to your cluster in SSH mode. For more information, see Log on to a cluster.

    2. Run the following command to upload the JAR package to HDFS:

      hadoop fs -put /test/hiveudf-1.0-SNAPSHOT.jar /user/hive/warehouse/

      You can run the hadoop fs -ls /user/hive/warehouse/ command to check whether the upload is successful. If the following information is returned, the upload is successful:

      Found 1 items
      -rw-r--r--   1 xx xx 2668 2021-06-09 14:13 /user/hive/warehouse/hiveudf-1.0-SNAPSHOT.jar
  3. Create a UDF.

    1. Run the following command to open the Spark SQL CLI:

      spark-sql
    2. Run the following command to create a UDF based on the uploaded JAR package:

      create function myfunc as "org.example.MyUDF" 
      using jar "hdfs:///user/hive/warehouse/hiveudf-1.0-SNAPSHOT.jar";
      Note

      In the preceding command, myfunc is the name of the UDF. org.example.MyUDF is the class that is created in the UDF development process. hdfs:///user/hive/warehouse/hiveudf-1.0-SNAPSHOT.jar is the path of HDFS to which the JAR package is uploaded.

  4. Run the following command to use the UDF.

    You can specify the name of the UDF in the command to use the UDF in the same way as you use a built-in function.

    select myfunc("abc");

    The following information is returned:

    OK
    abc:HelloWorld