This topic describes how to create an E-MapReduce (EMR) function.

Prerequisites

  • An Alibaba Cloud EMR cluster is created. The inbound rules of the security group to which the cluster belongs include the following rules:
    • Action: Allow
    • Protocol type: Custom TCP
    • Port range: 8898/8898
    • Authorization object: 100.104.0.0/16
  • An EMR compute engine instance is associated with the desired workspace. The EMR folder is displayed only after you associate an EMR compute engine instance with the workspace on the Workspace Management page. For more information, see Configure a workspace.
  • The required resources are uploaded.

Procedure

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides, find the workspace, and then click Data Analytics in the Actions column.
  2. Create a workflow. For more information, see Create a workflow.
  3. Write Java code in an offline Java environment, compress the code to a JAR package, and then upload the package as a JAR resource to DataWorks. For more information, see Create and use an EMR JAR resource.
  4. Create a function.
    1. Click the workflow in the Business Flow section, right-click EMR, and then choose Create > Function.
    2. In the Create Function dialog box, set the Function Name, Engine Instance, and Location parameters.
    3. Click Create.
    4. In the Function information section of the configuration tab that appears, set the parameters.
      Function information section
      Parameter Description
      Function Type The type of the function. Valid values: Mathematical Operation Functions, Aggregate Functions, String Processing Functions, Date Functions, Window Functions, and Other Functions.
      Engine Instance The EMR cluster that is associated with the current workspace. By default, you cannot change the engine instance.
      Engine Type The type of the compute engine. By default, you cannot change the engine type.
      EMR database The database where the EMR cluster resides. Select a database from the drop-down list. To create a database, click New Library. In the New Library dialog box, set the parameters and click OK.
      Function Name The name of the function. You can use this name to reference the function in SQL statements. The function name must be globally unique and cannot be changed after the function is created.
      Owner This parameter is automatically set.
      Class Name Required. The name of the class that implements the function.
      Resource Required. The resource to be used in the function. Select a resource from the ones that are created in the current workspace from the drop-down list. To create a resource, click Create Resource. In the Create Resource dialog box, set the parameters and click Create.
      Description The description of the function.
      Expression Syntax The syntax of the function. Example: test.
      Parameter Description The description of the input and output parameters that are supported.
      Return Value Optional. The return value. Example: 1.
      Example Optional. The example of the function.
  5. Click the Save icon in the top toolbar.
  6. Commit the function.
    1. Click the Submit icon icon in the top toolbar.
      Note You must select a resource group for scheduling when you commit the EMR function. We recommend that you use an exclusive resource group for scheduling. If no exclusive resource groups for scheduling are available, you can purchase and configure one. For more information, see Create and use an exclusive resource group for scheduling.
    2. In the Commit Node dialog box, enter your comments in the Change description field.
    3. Click OK.
  7. Commit the function.
    1. Click the Commit icon in the top toolbar.
    2. In the Commit Node dialog box, enter your comments in the Change description field.
    3. Click OK.