DataWorks lets you visually create E-MapReduce (EMR) JAR and EMR FILE resources. You can upload custom functions or open-source MapReduce (MR) sample code as resources and reference them in data development tasks that run on EMR compute nodes. This topic describes how to create, upload, and commit a resource.
Prerequisites
Prerequisites vary by engine type. You must complete the required preparations in both EMR and DataWorks.
-
DataLake: For more information, see Configure a DataLake cluster and Configure DataWorks.
Create an EMR resource
-
Log on to the DataWorks console. In the target region, click in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Data Development.
-
Move the pointer over the
icon and click or .Alternatively, find the workflow, right-click the workflow, and choose or .
-
In the Create Resource dialog box, configure the parameters.
Parameter
Description
Engine Type
The engine type is EMR by default and cannot be changed.
Engine Instance
Select the engine instance from the drop-down list.
NoteThis list displays the EMR engines bound to the workspace in Data Development.
Resource Type
Only EMR JAR and EMR FILE resource types are supported.
Path
The workflow path where the resource will be located.
Storage path
Select a storage path for the resource. Supported storage types include OSS and HDFS.
-
If you select OSS, you must first grant authorization and then select a directory.
NoteYou must use an Alibaba Cloud account to grant the permissions.
-
If you select HDFS, you must manually enter the storage path.
NoteTask JAR packages can be stored only in the following locations:
-
The master node of the EMR cluster.
-
Object Storage Service (OSS). We recommend that you store JAR packages in OSS. For more information about how to store JAR packages in OSS, see Operations in the OSS console.
File Source
The source of the target file. Supported sources include Local and OSS.
-
If you select Local, click Click Upload in the Upload File field to upload a local file.
-
If you select OSS, select an OSS file from the Select file drop-down list, or click Create in OSS to create an OSS file.
Name
The name of the new EMR resource. If you upload a JAR resource, you must include the .jar extension.
-
-
In the Create Resource dialog box, click Create.
-
Click the
and
icons in the toolbar to save and commit the resource.NoteWhen you commit the resource, you must select a scheduling resource group. If you use a serverless resource group, DataWorks sends a task to the engine to create the resource and prints the execution logs. If a problem occurs during the commit, use the logs for troubleshooting. If you do not have an available serverless resource group, you must purchase and configure one. For more information, see Use serverless resource groups.
Use a resource to register a function
DataWorks provides a visual way to register a function by using a resource. After you upload the required resource, you can use it to register a function in the UI. In Data Development, open the Register Function form and configure the parameters. For example, you can set Function Type to Other Function, select a target EMR Engine Instance such as xc_emr2, set EMR Engine Type to Hive, and set EMR Database to default. Then, enter a Function Name such as xc_ip2region, and the full class name of the UDF, such as org.alidata.emr.udf.Ip2Region. Finally, for Resource List, associate the function with the uploaded JAR file from the resource tree on the left, such as xc_ip2region-emr.jar.
Use a resource in a node
After you create an EMR JAR resource, to use the resource directly in a node, select the resource node in the Resources folder, right-click the node, and then choose Insert Resource Path. You can also right-click the resource file in the resource tree on the left and choose Insert Resource Path.
After you insert the resource path, a line of code in the format @resource_reference{"resourcename"} is automatically added to the node, which references the resource.
For detailed steps, see Create an EMR MR node.
Manage resource versions
A new resource version is generated each time you submit a resource. You can view and download the resource by right-clicking its resource node and clicking View Versions. In the resource directory on the left, right-click the target resource file, such as xc_ip2region.jar, and select View Historical Versions. The Version Information dialog box appears and displays the File ID, Version Number, Submitter, Submission Time, Change Type, and Status for each version. You can click Download Code for a specific version to obtain its historical code, or select multiple versions and then click the Compare button at the bottom to compare their differences.