Upload a JAR file containing user-defined functions (UDFs) or open source MapReduce code as an E-MapReduce (EMR) resource in DataWorks. Once committed, reference the resource in EMR compute nodes to use it in task scheduling.
Prerequisites
Before you begin, ensure that you have:
-
An EMR DataLake cluster or an EMR Hadoop cluster associated with your DataWorks workspace
-
Completed the cluster-side and DataWorks-side setup for your cluster type:
-
DataLake cluster: Configure an EMR data lake cluster and Configure DataWorks
-
Hadoop cluster: Associate an EMR cluster with a DataWorks workspace as an EMR compute engine instance
-
Create an EMR resource
-
Log on to the DataWorks console. In the top navigation bar, select the region. In the left-side navigation pane, choose Data Development and O&M > Data Development. Select the workspace from the drop-down list and click Go to Data Development.
-
Move the pointer over the
icon and choose Create Resource > EMR > EMR JAR or Create Resource > EMR > EMR File. Alternatively, find the desired workflow, right-click the workflow name, and choose Create Resource > EMR > EMR JAR or Create Resource > EMR > EMR File. -
In the Create Resource dialog box, configure the parameters.
Parameter Description Required Engine type Fixed as EMR. Cannot be changed. Yes Engine instance Select a compute engine from the drop-down list. The list shows all EMR compute engines associated with your workspace. Yes Resource type The type of resource to create. Valid values: EMR File and EMR JAR. Yes Path The workflow in which you want to create the resource. Yes Storage path Where the resource file is stored. Valid values: OSS and HDFS (Hadoop Distributed File System).
-
OSS: Click Authorize next to OSS to authorize DataWorks to access Object Storage Service (OSS). Use your Alibaba Cloud account to complete authorization, then select a folder. Store JAR packages in OSS for easier management. For details, see Operations in the OSS console.
-
HDFS: Enter a storage path manually. JAR packages can also be stored on the master node of the EMR cluster.
Yes File source The source of the file to upload. Valid values: Local and OSS.
-
Local: Click Upload in the File field to upload a file from your on-premises machine.
-
OSS: Select an OSS object in Select File, or click Create in OSS to create a new object.
Yes Name The name of the EMR resource. For JAR files, include the .jarextension in the resource name.Yes 
-
-
Click Create.
-
Click the
icon to save, then click the
icon to commit the resource. When committing, select a resource group for scheduling. If you use a serverless resource group, DataWorks issues a task to the compute engine and displays run logs. Use the run logs to troubleshoot any errors. If no serverless resource groups are available, purchase and configure one. For more information, see Use serverless resource groups.
Use a resource to register functions
After uploading a resource to DataWorks, use it when registering functions.
Reference a resource in node code
After creating an EMR JAR resource, insert its path into your node code.
In the Resource panel, right-click the resource name and select Insert Resource Path. The resource path is inserted into the code in the following format:
@resource_reference{"Resource name"}
For a complete walkthrough of referencing a resource in a node, see Create an EMR MR node.
Manage resource versions
Each commit generates a new resource version. To view or download a specific version, right-click the resource name and select View Versions.