Create UDF

Last Updated: May 16, 2017

### About UDFUDF is short for User Defined Function. MaxCompute provides many nested functions to meet your computing requirements. You can also create UDFs tailored to different computing scenarios. The usage of UDFs is similar to that of general nested functions. Java UDF refers to the UDFs implemented through Java.

The general Java UDF usage process on the Data IDE consists of the following four steps: (1) Prepare and debug the Java code in a local environment, and then compile it into a JAR package; (2) Create resources on the IDE platform and upload the JAR package; (3) Create a new function on the IDE platform and associate the function with the JAR resources; (4) Use the Java UDF. Details are shown in the figure below:

1

Case study

Implement a UDF for converting lower-case letters.

Step 1: Coding

Compile Java code for functional implementation in your local Java environment ([UDF Development Extension] should be installed(https://help.aliyun.com/document_detail/odps/tools/eclipse/udf.html)), following the MaxCompute UDF Framework, and compile it into a JAR package.

The Java code of this example is below. The compiled JAR package is my_lower.jar.

  1. package test_UDF;
  2. import com.aliyun.odps.udf.UDF;
  3. public class test_UDF extends UDF {
  4. public String evaluate(String s) {
  5. if (s == null) { return null; }
  6. return s.toLowerCase();
  7. }
  8. }

Step 2: Add a JAR resource

Before running a UDF, you must specify the UDF code to be referenced. The compiled Java code should be uploaded to the Data IDE as a resource. Java UDF must be compressed into a JAR package to be added to the platform as a JAR package resource. The UDF framework will automatically load the JAR package to run the UDF.

The specific steps are as follows:

Step 1: Go to Alibaba Cloud Dataplus platform > Data IDE Kit > Console as a developer, click the Enter Work Zone in the action bar of the corresponding project.

1

Step 2: Create a new resource file. Right click on “Upload Resources” on the folder in the file directory tree to upload resources.

Step 3: Fill in the configuration items in the Resource Upload pop-up box, and then click “Submit”.

1

After the submission succeeds, the resources will be created successfully.

Step 3: Register UDFs

In the above steps, we have completed the compilation of Java UDF code and the upload of JAR resources, and the Data IDE is able to get the user code automatically for running. But this UDF remains unavailable in the Data IDE at this time, because the platform does not have any information about the UDF yet. Therefore, you need to register a unique function name on the platform and map the function name with the function of a specific resource.

The specific steps are as follows:

Step 1: Create the function directory. Switch to “Manage Functions” in the file directory tree and create a new directory.

Step 2: Right click on the directory folder > New Function, or click New > New Function at the top right corner in the right work zone.

Step 3: Fill in the various configuration items in the pop-up box of the New MaxCompute Function.

1

After the submission succeeds, the function will be created successfully.

Step 4: Test the function in MaxCompute SQL

The specific usage of a Java UDF is the same with that of nested functions on the platform. The specific steps are as follows:

Step 1: Click New > New Script File at the top right corner of the work zone to create an SQL script file.

Step 2: Compile the SQL statement in the code editor.Sample code:

  1. select my_lower('A') from dual;

Step 3: Click the 1button .

1

So far we have completed the registration of the Java UDF and run the local call test in SQL. If you require daily scheduling to perform the lower-case character conversions, you can create a new MaxCompute SQL node in the workflow and configure the scheduling attributes of the workflow.

Thank you! We've received your feedback.