Create UDF

Last Updated: Oct 31, 2017

About UDF

UDF is short for User Defined Function. MaxCompute provides many built-in functions to meet your computing requirements. You can also create UDFs to meet different computing needs. The usage of UDFs is similar to that of general built-in functions. Java UDF refers to the UDFs implemented by using Java.

The general Java UDF usage process in the big data platform consists of the following four steps: (1) Prepare and debug Java codes in a local environment, and then compile them into a JAR package; (2) Create resources in DataWorks and upload the JAR package; (3) Create and register a function in DataWorks and associate the function with the JAR resources; (4) Use the Java UDF. Details are shown in the following figure:

1

Use case

Implement a UDF for converting lower-case letters.

Step 1: Coding

Follow the instructions in MaxCompute UDF framework to write Java codes for functional implementation in your local Java environment (The UDF development plug-in needs to be added. For details, refer to UDF), and compile them into a JAR package.

The Java codes of this example are as follows. The compiled JAR package is my_lower.jar.

  1. package test_UDF;
  2. import com.aliyun.odps.udf.UDF;
  3. public class test_UDF extends UDF {
  4. public String evaluate(String s) {
  5. if (s == null) { return null; }
  6. return s.toLowerCase();
  7. }
  8. }

Step 2: Add a JAR resource

Before running a UDF, you must specify the UDF codes to be referenced. The Java codes you write should be uploaded to the big data platform as a resource. Java UDF must be built into a JAR package and added to the platform as a JAR package resource. The UDF framework automatically loads the JAR package to run the UDF.

The specific steps are as follows:

  1. Navigate to Alibaba Cloud DTplus platform > DataWorks> management console as a developer. Click Enter the work area in the action bar of the corresponding project.

    1

  2. Create a resource file. Right-click a folder in the file directory tree and select Upload Resource to upload the resource.

  3. Complete the configurations in the Upload resource pop-up window, and then click Submit.

    1

The resource is created successfully after the submission is successful.

Step 3: Register UDFs

In the preceding steps, we have finished writing the Java UDF codes and uploading the JAR resources, which enables big data platform to get and run the user codes automatically. But this UDF remains unavailable in the big data platform at this time, because the platform does not have any information about the UDF yet. Therefore, you must register a unique function name in the platform and map the function name with a function of a specific resource.

The specific steps are as follows:

  1. Create the function directory. Switch to Manage Functions in the file directory tree and create a new directory.

  2. Right-click the directory folder and select New Function, or click New at the upper right corner of the right-side work zone and then select New Function.

  3. Complete the configurations in the Create MaxCompute function pop-up window and then click Submit.

    1

The function is created successfully after the submission is successful.

Step 4: Test the function in MaxCompute SQL

The specific usage of a Java UDF is the same as that of built-in functions in the platform. The specific steps are as follows:

  1. Click New at the upper right corner of the work zone and then select New Script File to create a SQL script file.

  2. Write the SQL statement in the code editor.

    Sample code:

    1. select my_lower('A') from dual;
  3. Click the 1 button.

    1

    So far we have completed the registration of the Java UDF and run the local call test in SQL. If you require daily scheduling to perform the lower-case letter conversions, create a new MaxCompute SQL node in the workflow and configure the scheduling properties of the workflow.

Thank you! We've received your feedback.