DataWorks allows you to create and manage MaxCompute functions in the DataWorks console. You can either execute MaxCompute SQL statements to create and manage MaxCompute functions, or register MaxCompute functions in the DataWorks console. This topic describes how to create and use MaxCompute user-defined functions (UDFs) in the DataWorks console.

Prerequisites

Before you can register a MaxCompute function, you must upload an existing resource or create and add a resource to DataWorks by using the DataWorks console.

Limits

DataWorks allows you to view and manage functions that are uploaded by using the DataWorks console only in the DataWorks console. If you add functions to a MaxCompute compute engine by using other tools such as MaxCompute Studio, you must use the MaxCompute Functions feature in DataWorks DataStudio to manually load the functions to DataWorks. You can view and manage the functions in DataWorks after the loading is complete. For more information, see MaxCompute functions.

Register a function

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where the workspace resides. On the Workspaces page, find the workspace in which you want to create tables, and click DataStudio in the Actions column.
  2. Create a workflow. For more information, see Create an auto triggered workflow.
  3. Create a Java Archive (JAR) or Python resource, and commit and deploy the resource. For more information, see Create and use MaxCompute resources.
  4. Create a function.
    1. Expand the desired workflow, right-click MaxCompute, and then select Create Function.
    2. In the Create Function dialog box, configure the Name and Path parameters.
    3. Click Create.
    4. In the Register Function section of the configuration tab that appears, configure the parameters that are described in the following table.
      Register a function
      ParameterDescription
      Function TypeThe type of the function. Valid values: Mathematical Operation Functions, Aggregate Functions, String Processing Functions, Date Functions, Window Functions, and Other Functions. For more information, see Functions.
      Engine Instance MaxComputeThe MaxCompute compute engine instance. The value of this parameter cannot be changed.
      Function NameThe name of the UDF. You can use this name to reference the function in SQL statements. The function name must be globally unique and cannot be changed after the function is registered.
      OwnerThe owner of the function. The default owner is the account that is used to log on to the DataWorks console. You can change the value of this parameter.
      Class NameThe name of the class that implements the UDF. Configure this parameter in the Resource name.Class name format. The resource name can be the name of a Java or Python package.
      When you register UDFs in the DataWorks console, you can reference MaxCompute resources including JAR packages and Python resources. The value format of this parameter varies based on the resource type:
      • If the resource type is JAR, configure the Class Name parameter in the JAR package name.Actual class name format. You can query the class name by executing the copy reference statement in IntelliJ IDEA.

        For example, if com.aliyun.odps.examples.udf is the Java package name and UDAFExample is the class name, the value of the Class Name parameter is com.aliyun.odps.examples.udf.UDAFExample.

      • If the resource type is Python, configure the Class Name parameter in the Python resource name.Actual class name format.
        For example, if LcLognormDist_sh is the Python resource name and LcLognormDist_sh is the class name, the value of the Class Name parameter is LcLognormDist_sh.LcLognormDist_sh.
        Note
        • You do not need to include the .jar or .py suffix in the resource name.
        • You can use a resource after the resource is committed and deployed. For information about how to create a MaxCompute resource, see Create and use MaxCompute resources.
      ResourcesRequired. The resource that has been uploaded or added to DataWorks by using the DataWorks console. You can perform a fuzzy match to search for the desired resource in the current workspace.
      Note
      • You do not need to specify the path of the added resources.
      • If multiple resources are referenced in the UDF, separate the resource names with commas (,).
      DescriptionThe description of the UDF.
      Expression SyntaxThe syntax of the UDF. Example: test.
      Parameter DescriptionThe description of the input and output parameters that are supported.
      Return ValueOptional. The return value. Example: 1.
      ExampleOptional. The example of the UDF.
  5. Click the Save icon in the top toolbar.
  6. Commit the UDF.
    1. Click the Submit icon in the top toolbar.
    2. In the Submit dialog box, enter your comments in the Change description field.
    3. Click Confirm.

For information about how to view the functions in a MaxCompute compute engine and the change history of the functions, and perform other operations, see MaxCompute functions.

View the function version and roll back the function

For example, in the MaxCompute folder in the Business Flow section in the Scheduled Workflow pane, you can right-click a MaxCompute function and select View Earlier Versions to view the earlier versions of the function or roll back the function.

View

Use functions in nodes

If you want to use a function in a node, you can directly reference the name of the function in the code of the node. In detail, you can find the function in the Scheduled Workflow pane, right-click the function name, and then select Insert Function. This way, the function is displayed on the configuration tab of the node. Use functions in nodes

Appendix 1: Run commands to query functions in a compute engine

show functions; Queries functions in a compute engine. 
DESC  function  functionname; Queries the registration details of a function. 

Appendix 2: View the list of built-in functions

For information about the list of built-in functions, see Overview.

Appendix 3: Add functions of a compute engine to DataWorks for management

You can use the MaxCompute function feature to load MaxCompute compute engine resources to DataWorks for visualized management and usage. For more information, see Manage MaxCompute resources.