edit-icon download-icon

Create UDFs

Last Updated: Mar 26, 2018

What are UDFs

UDF is short for User Defined Function. MaxCompute provides many built-in functions to meet your computing demands. You can also create UDFs to meet different computing needs. The usage of UDFs is similar to that of general built-in functions. Java UDF means the UDFs implemented by using Java.

The general Java UDF usage process in the big data platform consists the following four steps: (1) Prepare and debug Java codes in a local environment, and then compile them into a JAR package; (2) Create resources in DataWorks and upload the JAR package; (3) Create and register a function in DataWorks and associate function with the JAR resources; (4) Use the Java UDF. See the following figure:

1

Use case

Implement an UDF for converting lower-case letters.

Step 1: Coding

Follow the instructions in MaxCompute UDF framework to write Java codes for functional implementation in your local Java environment (The UDF development plug-in must be added. For more information, see UDF), and compile them into a JAR package.

The Java codes of this example are as follows. The compiled JAR package is my_lower.jar.

  1. package test_UDF;
  2. import com.aliyun.odps.udf.UDF;
  3. public class test_UDF extends UDF {
  4. public String evaluate(String s) {
  5. if (s == null) { return null; }
  6. return s.toLowerCase();
  7. }
  8. }

Step 2: Add a JAR resource

Before running an UDF, you must specify the UDF codes to be referenced. You must upload Java codes you write to the big data platform as a resource. Java UDF must be built into a JAR package and added to the platform as a JAR package resource. The UDF framework automatically loads the JAR package to run the UDF.

Follow these steps:

  1. Log on to the Alibaba Cloud DTplus console as a developer. Select DataWorks> Management console. Click Enter the work area in the Actions column of the corresponding project.

    1

  2. Create a resource file. Right-click a folder in the file directory tree and select Upload Resource to upload the resource.

  3. Complete the configurations in the Upload resource dialog box, and click Submit.

    1

Once the submission is successful, the resource is created successfully.

Step 3: Register UDFs

In the preceding steps, we have finished writing the Java UDF codes and uploading the JAR resources, which enables big data platform to get and run the user codes automatically. But this UDF remains unavailable in the big data platform at this time, because the platform does not have any information about the UDF yet. Therefore, you must register a unique function name in the platform and map the function name with a function of a specific resource.

The specific steps are as follows:

  1. Create the function directory. Switch to Manage Functions in the file directory tree and create a new directory.

  2. Right-click the directory folder and select New Function, or click New in the upper-right corner and select New Function.

  3. Enter the values in the required fields in the Create MaxCompute function dialog box and click Submit.

    1

After a successful submission, the function is created successfully.

Step 4: Test the function in MaxCompute SQL

The specific usage of a Java UDF is the same as that of built-in functions in the platform. The specific steps are as follows:

  1. Click New in the upper-right corner and select New Script File to create a SQL script file.

  2. Write the SQL statement in the code editor.

    Sample code:

    1. select my_lower('A') from dual;
  3. Click the 1 button.

    1

So far, we have completed the registration of the Java UDF and run the local call test in SQL. If you require daily scheduling to perform lower-case letter conversions, create a new MaxCompute SQL node in the workflow and configure the scheduling properties of the workflow.

Thank you! We've received your feedback.