UDF

Last Updated: Jun 23, 2016

MaxCompute UDF include: UDF, UDAF and UDTF. Usually these three kinds of functions are called by a joint name ‘UDF’. Users who use Maven can search “odps-sdk-udf” from Maven Library to get Java SDK with different versions. The related configuration is shown as follows:

  1. <dependency>
  2. <groupId>com.aliyun.odps</groupId>
  3. <artifactId>odps-sdk-udf</artifactId>
  4. <version>0.20.7-public</version>
  5. </dependency>

Notes:

  • UDF currently only supports Java language. If you want to write UDF program, you can upload UDF code into the project through Add Resource and create UDF through Create Function.
  • The code examples of UDF, UDAF and UDTF will be given seperately in this section. To run UDF, please refer to UDF Eclipse Plug-in Introduction.
  • If you need to use UDF, you should submit application in workorder system, providing odps project name and application scenario. After the application is passed, you can use UDF.

UDF Example

An entire development example of UDF will be given below. For example, to develop the UDF to realize character lowercase conversion, go through the following steps:

  • Coding: realize the UDF function in accordance with ODPS UDF framework and do compiling. Next is a simple coding instance:
  1. package org.alidata.odps.udf.examples;
  2. import com.aliyun.odps.udf.UDF;
  3. public final class Lower extends UDF {
  4. public String evaluate(String s) {
  5. if (s == null) { return null; }
  6. return s.toLowerCase();
  7. }
  8. }
  • Nominate this jar package ‘my_lower.jar’.

Note:

  • For the information of SDK, please refer to UDF SDK.
  • Add resource: Specifying the referenced UDF code is needed before running UDF. The user’s code is added to ODPS in form of resource. Java UDF must be made into jar package and added in ODPS in form of jar resource. UDFframework will load jar package automatically and run UDF. MaxCompute MapReduce aslo describes the use of resource.

  • Execute the command:

  1. add jar my_lower.jar;
  2. -- If the resource name has existed, rename the jar package.
  3. -- Pay attention to modifying related name of jar package in following command.
  4. -- Or use f option directly to overwrite original jar resource.
  • Register UDF: MaxCompute is able to obtain user’s code and run it after the jar package has been uploaded. But now this UDF can not be used because MaxCompute does not have any information about this UDF. It requires the user to register a unique function name in ODPS and specify which function corresponding to this function name with which jar resource. For registering UDF, please refer to Create Function. Run the command:
  1. CREATE FUNCTION test_lower AS org.alidata.odps.udf.examples.Lower USING my_lower.jar;

Use this function in sql:

  1. select test_lower('A') from my_test_table;

UDAF Example

The register method of UDAF is similar to UDF. Its usage is also the same as Aggregation Function in Built-in function. Next is a UDAF code example to calculate the average:

  1. package org.alidata.odps.udf.examples;
  2. import com.aliyun.odps.io.LongWritable;
  3. import com.aliyun.odps.io.Text;
  4. import com.aliyun.odps.io.Writable;
  5. import com.aliyun.odps.udf.Aggregator;
  6. import com.aliyun.odps.udf.UDFException;
  7. /**
  8. * project: example_project
  9. * table: wc_in2
  10. * partitions: p2=1,p1=2
  11. * columns: colc,colb,cola
  12. */
  13. public class UDAFExample extends Aggregator {
  14. @Override
  15. public void iterate(Writable arg0, Writable[] arg1) throws UDFException {
  16. LongWritable result = (LongWritable) arg0;
  17. for (Writable item : arg1) {
  18. Text txt = (Text) item;
  19. result.set(result.get() + txt.getLength());
  20. }
  21. }
  22. @Override
  23. public void merge(Writable arg0, Writable arg1) throws UDFException {
  24. LongWritable result = (LongWritable) arg0;
  25. LongWritable partial = (LongWritable) arg1;
  26. result.set(result.get() + partial.get());
  27. }
  28. @Override
  29. public Writable newBuffer() {
  30. return new LongWritable(0L);
  31. }
  32. @Override
  33. public Writable terminate(Writable arg0) throws UDFException {
  34. return arg0;
  35. }
  36. }

UDTF Example

The register method of UDTF is similar to UDF. Its usage is the same as UDF. The code example is shown as follows:

  1. package org.alidata.odps.udtf.examples;
  2. import com.aliyun.odps.udf.UDTF;
  3. import com.aliyun.odps.udf.UDTFCollector;
  4. import com.aliyun.odps.udf.annotation.Resolve;
  5. import com.aliyun.odps.udf.UDFException;
  6. // TODO define input and output types, e.g., "string,string->string,bigint".
  7. @Resolve({"string,bigint->string,bigint"})
  8. public class MyUDTF extends UDTF {
  9. @Override
  10. public void process(Object[] args) throws UDFException {
  11. String a = (String) args[0];
  12. Long b = (Long) args[1];
  13. for (String t: a.split("\\s+")) {
  14. forward(t, b);
  15. }
  16. }
  17. }
Thank you! We've received your feedback.