All Products
Search
Document Center

DataWorks:EMR resources and functions

Last Updated:Mar 26, 2026

Data Studio lets you upload and manage E-MapReduce (EMR) JAR and File resources, then register them as user-defined functions (UDFs) for use in data development nodes and SQL queries.

Prerequisites

Before you begin, ensure that you have:

  • An EMR compute resource or an EMR Serverless Spark compute resource attached to your DataWorks workspace

  • The resource files to upload, available on your local computer or in an Object Storage Service (OSS) bucket

If you upload files from OSS, the following conditions must also be met:

Go to Resource Management

  1. Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a region. Find your workspace and choose Shortcuts > Data Studio in the Actions column.

  2. In the left navigation pane, click the Resource Management icon image to open the Resource Management page.

  3. Click the image icon to create a resource or function. To organize resources first, click Create Directory, then right-click a folder and choose the resource or function type to create.

Resource types

Data Studio supports the following EMR resource types:

Resource typeDescriptionSupported upload methods
EMR FileAny file type uploaded as a File resource. Whether the file can be used depends on the compute engine.Local, OSS
EMR JarA compiled Java JAR package for running Java programs. The file must have a .jar extension.

Resources can be stored in OSS or in Hadoop Distributed File System (HDFS).

Important

Using EMR resources stored in or uploaded to OSS incurs standard OSS fees.

Limits

  • Publishing to production: In a standard mode workspace, publish the resource to the production environment before it takes effect. Data source configurations differ between the development and production environments—confirm the correct configuration before querying.

  • Resource visibility: DataWorks only shows resources that were uploaded through the DataWorks interface.

Create a resource

  1. On the Resource Management page, click the create icon to open the Create Resource or Function dialog box. Set the Type, Path, and Name of the resource.

  2. After creating the resource, configure the upload source and storage. The key parameters are described below.

    ParameterDescription
    File sourceThe source of the file. Valid values: Local and OSS.
    File contentIf Local is selected, click Click to upload in the Upload File section. If OSS is selected, choose a file from the Select File drop-down list.
    Storage pathWhere the resource is stored. Valid values: OSS and HDFS. For OSS, grant permissions and then select a directory (authorization requires an Alibaba Cloud account). For HDFS, manually enter the path, for example, /user/admin/[specific path]. JAR packages can be stored on the EMR cluster's master node or in OSS. OSS is recommended—see Store JAR packages in OSS.
    Data sourceThe data source the uploaded EMR resource belongs to.
    Resource groupA Serverless resource group that can connect to the EMR data source. If no Serverless resource group is available, add one.
  3. In the toolbar, click Save and then Publish. Only published resources can be used in Data Studio.

    Note

    When a Serverless resource group submits a resource, DataWorks sends a task to the DPI engine and generates execution logs. Use these logs to troubleshoot any submission issues.

Use a resource

After creating a resource, reference it in a data development node:

  1. In the left navigation pane, click Resource Management.

  2. Find the target resource, right-click it, and select Reference Resource.

The following code is added to the node:

##@resource_reference{"Resource Name"}

For example, referencing example.jar from an EMR MR node adds ##@resource_reference{"example.jar"}. The exact format varies by node type.

Alternatively, register the resource as a UDF and use it as a function.

Create a function

Before creating a function, create the resource it will be based on.

Function description

In the Resource Management section of Data Studio, you can register a resource as an EMR function. In Data Studio or in SQL queries, you can use the built-in functions provided by Hive in addition to the user-defined functions that you create.

Function types

When registering a function, select one of the following types:

TypeDescription
MATHMathematical operations
AGGREGATEAggregate functions
STRINGString operations
DATEDate operations
ANALYTICWindow (analytic) functions
OTHEROther function types

Create and publish a function

  1. On the Resource Management page, click the create icon to open the Create Resource or Function dialog box. Set the Type, Path, and Name of the function.

  2. Click Confirm, then configure the function parameters:

    ParameterDescription
    Function TypeThe function category. See the function types table above.
    Data SourceThe data source where the function will be registered.
    EMR DatabaseThe EMR database where the function will be registered.
    Resource GroupA Serverless resource group that can connect to the EMR data source.
    Class NameThe fully qualified class name of the UDF, in the format Java package name.Actual class name. This must match the class name in the JAR package. For example, if the Java package is com.aliyun.emr.examples.udf and the class is UDAFExample, set this to com.aliyun.emr.examples.udf.UDAFExample. To find the class name, run Copy Reference in IntelliJ IDEA.
    Resource ListRequired. Select a resource from the current workspace.
  3. In the toolbar, click Save and then Publish. Only published functions are available in Data Studio.

Use a function

After a function is published, use it in a data development node or an SQL query.

In a data development node:

  1. In the left navigation pane, click Resource Management.

  2. Find the function, right-click it, and select Insert Function.

The function name is automatically inserted into the editor, for example, example_function().

In an SQL query:

Call the function directly by name:

SELECT example_function(column_name) FROM table;

Manage resources and functions

From the Resource Management page, you can manage existing resources and functions.

  • View version history: Click the version icon on the right side of the resource or function editing page to view and compare saved or submitted versions. Select at least two versions to compare.

  • Delete a resource or function: Right-click the target item and select Delete. To remove it from the production environment, publish the deletion. The resource or function is deleted from the production environment after the task is published.