All Products
Search
Document Center

DataWorks:Resource management

Last Updated:Feb 28, 2025

Data Studio allows you to reference custom resources or user-defined functions (UDFs) in your data analytics code. Before you use a custom resource or a UDF in a node created in your workspace, you must create a custom resource or a UDF in or upload a custom resource or a UDF to your workspace. Data Studio supports custom resources or UDFs of MaxCompute, E-MapReduce (EMR), CDH, and Flink. This topic describes how to create a resource or function in a visualized manner and use the resource or function in a node.

Prerequisites

The following operations are performed if you want to create a resource by uploading an Object Storage Service (OSS) object:

  • OSS is activated, an OSS bucket is created, and the file that you want to upload is stored in the OSS bucket. If you want to create a resource by uploading an OSS object, you must specify the object that you want to upload in the OSS bucket. For more information, see Create a bucket and Simple upload.

  • The Alibaba Cloud account that you want to use to upload the OSS object is granted the permissions to access the OSS bucket. This helps prevent permission restriction. For more information, see Access control.

Limits

  • Resource size

    A maximum of 500 MB resources can be uploaded.

  • Resource deployment

    If you use a workspace in standard mode, you must deploy resources to the production environment. This way, the resources can be used by projects in the production environment.

    Note

    The information about a data source may vary in the development environment and production environment. You must be clear about the information about the data source in the environment where you want to query data. This ensures that you can query valid table and resource data in subsequent operations. For information about how to view the information about a MaxCompute data source in different environments, see Add a MaxCompute data source.

  • Resource management

    You can view and manage resources that are uploaded by using DataWorks in the DataWorks console.

Supported resource and function types

Resource types

Data source type

Resource type

Supported creation method

MaxCompute

MaxCompute Python

Python code that is used to register a Python UDF. The name of a resource of this type is ended with .py.

  • Upload an on-premises resource

  • Upload an OSS object

MaxCompute Jar

A compiled JAR package that is used to run Java programs. The name of a resource of this type is ended with .jar.

MaxCompute Archive

You can upload only archive files whose names are suffixed with .zip, .tgz, .tar.gz, or .tar to DataWorks as resources. You can determine the compression type based on the file name extension.

MaxCompute File

You can upload any type of file to DataWorks as a file resource. You can check whether the related compute engine supports this type of resource before you use such a resource.

EMR

  • EMR File

    You can upload any type of file to DataWorks as a file resource. You can check whether the related compute engine supports this type of resource before you use such a resource.

  • EMR Jar

    A compiled JAR package that is used to run Java programs. The name of a resource of this type is ended with .jar.

  • Upload an on-premises resource

  • Upload an OSS object

CDH

  • CDH File

    You can upload any type of file to DataWorks as a file resource. You can check whether the related compute engine supports this type of resource before you use such a resource.

  • CDH Jar

    A compiled JAR package that is used to run Java programs. The name of a resource of this type is ended with .jar.

Upload an on-premises resource

Flink

Flink Jar

A compiled JAR package that is used to run Java programs. The name of a resource of this type is ended with .jar.

Upload an on-premises resource

Function types

DataWorks allows you to create MaxCompute, EMR, CDH, and Flink functions.

Step 1: Go to the RESOURCE MANAGEMENT pane

  1. Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose Shortcuts > Data Studio in the Actions column.

  2. In the left-side navigation pane of the Data Studio page, click the image icon to go to the RESOURCE MANAGEMENT pane.

  3. In the RESOURCE MANAGEMENT pane, click Create or the image icon. Alternatively, create a directory first, right-click the directory name, and then select a resource or function type.

Step 2: Create a resource or function

Create a resource

  1. Configure information about the resource.

    Select Upload On-premises File or Upload OSS Object for the File Source parameter, and configure the remaining parameters.

  2. In the top toolbar of the configuration tab, click Save and then Deploy to deploy the resource. A resource can be used in Data Studio only after it is deployed.

Create a function

Note

You must create a function based on a resource. Therefore, before you create a function, you must create a resource and upload the resource to DataWorks.

  1. Configure information about the function.

    MaxCompute functions

    Description of key parameters

    Parameter

    Description

    Function Type

    The function type. Valid values: MATH, AGGREGATE, STRING, DATE, ANALYTIC, and OTHER.

    Class Name

    The name of the class that implements the UDF. Configure this parameter in the format of Resource name.Class name. The resource name can be the name of a Java or Python package.

    When you create UDFs in the DataWorks console, you can reference resources of the MaxCompute Jar and MaxCompute Python types. The value format of this parameter varies based on the resource type:

    • If the resource type is JAR, configure the Class Name parameter in the JAR package name.Actual class name format. You can query the class name by executing the Copy Reference statement in IntelliJ IDEA.

      For example, if com.aliyun.odps.examples.udf is the Java package name and UDAFExample is the class name, the value of the Class Name parameter is com.aliyun.odps.examples.udf.UDAFExample.

    • If the resource type is Python, configure the Class Name parameter in the Python resource name.Actual class name format.

      For example, if LcLognormDist_sh is the Python resource name and LcLognormDist_sh is the class name, the value of the Class Name parameter is LcLognormDist_sh.LcLognormDist_sh.

    Note
    • You do not need to include the .jar or .py suffix in the resource name.

    • You can use a resource after the resource is committed and deployed.

    Type

    Valid values: Resource Function and Embedded Function.

    • If you set this parameter to Resource Function, you must configure the Resources parameter.

    • If you set this parameter to Embedded Function, you must configure the Resources, Language, and Code parameters. Valid values of the Language parameter: JAVA, PYTHON2, and PYTHON3. For more information about code-embedded UDFs, see Code-embedded UDFs.

    Resources

    The resources that you want to use to register the function.

    • Visual mode: If you configure the function in this mode, you can select only the resources that have been uploaded or added to DataWorks.

    • Code editor: If you configure the function in this mode, you can enter all resources in the data source. If multiple resources are referenced in the UDF, separate the resource names with commas (,).

    Note
    • You do not need to specify the path of the added resources.

    • You can specify the following types of resources when you configure a function in visual mode: resources that are not uploaded to DataWorks in visual mode, such as table resources, and resources that are uploaded to MaxCompute by using other methods and not managed in DataWorks in a visualized manner.

    Command Syntax

    The example on how to use the UDF.

    EMR functions

    Description of key parameters

    Parameter

    Description

    Function Type

    The function type. Valid values: MATH, AGGREGATE, STRING, DATE, ANALYTIC, and OTHER.

    Class Name

    The name of the class that implements the UDF. Configure this parameter in the format of Resource name.Class name. The resource name can be the name of a Java or Python package.

    When you create UDFs in the DataWorks console, you can reference resources of the MaxCompute Jar and MaxCompute Python types. The value format of this parameter varies based on the resource type:

    • If the resource type is JAR, configure the Class Name parameter in the JAR package name.Actual class name format. You can query the class name by executing the Copy Reference statement in IntelliJ IDEA.

      For example, if com.aliyun.odps.examples.udf is the Java package name and UDAFExample is the class name, the value of the Class Name parameter is com.aliyun.odps.examples.udf.UDAFExample.

    • If the resource type is Python, configure the Class Name parameter in the Python resource name.Actual class name format.

      For example, if LcLognormDist_sh is the Python resource name and LcLognormDist_sh is the class name, the value of the Class Name parameter is LcLognormDist_sh.LcLognormDist_sh.

    Note
    • You do not need to include the .jar or .py suffix in the resource name.

    • You can use a resource after the resource is committed and deployed.

    Resources

    The resources that you want to use to register the function.

    • Visual mode: If you configure the function in this mode, you can select only the resources that have been uploaded or added to DataWorks.

    • Code editor: If you configure the function in this mode, you can enter all resources in the data source. If multiple resources are referenced in the UDF, separate the resource names with commas (,).

    Note
    • You do not need to specify the path of the added resources.

    • You can specify the following types of resources when you configure a function in visual mode: resources that are not uploaded to DataWorks in visual mode, such as table resources, and resources that are uploaded to MaxCompute by using other methods and not managed in DataWorks in a visualized manner.

    Command Syntax

    The example on how to use the UDF.

    CDH functions

    Description of key parameters

    Parameter

    Description

    Function Type

    The function type. Valid values: MATH, AGGREGATE, STRING, DATE, ANALYTIC, and OTHER.

    Class Name

    The name of the class that implements the UDF. Configure this parameter in the format of Resource name.Class name. The resource name can be the name of a Java or Python package.

    When you create UDFs in the DataWorks console, you can reference resources of the MaxCompute Jar and MaxCompute Python types. The value format of this parameter varies based on the resource type:

    • If the resource type is JAR, configure the Class Name parameter in the JAR package name.Actual class name format. You can query the class name by executing the Copy Reference statement in IntelliJ IDEA.

      For example, if com.aliyun.odps.examples.udf is the Java package name and UDAFExample is the class name, the value of the Class Name parameter is com.aliyun.odps.examples.udf.UDAFExample.

    • If the resource type is Python, configure the Class Name parameter in the Python resource name.Actual class name format.

      For example, if LcLognormDist_sh is the Python resource name and LcLognormDist_sh is the class name, the value of the Class Name parameter is LcLognormDist_sh.LcLognormDist_sh.

    Note
    • You do not need to include the .jar or .py suffix in the resource name.

    • You can use a resource after the resource is committed and deployed.

    Resources

    The resources that you want to use to register the function.

    • Visual mode: If you configure the function in this mode, you can select only the resources that have been uploaded or added to DataWorks.

    • Code editor: If you configure the function in this mode, you can enter all resources in the data source. If multiple resources are referenced in the UDF, separate the resource names with commas (,).

    Note
    • You do not need to specify the path of the added resources.

    • You can specify the following types of resources when you configure a function in visual mode: resources that are not uploaded to DataWorks in visual mode, such as table resources, and resources that are uploaded to MaxCompute by using other methods and not managed in DataWorks in a visualized manner.

    Command Syntax

    The example on how to use the UDF.

  2. In the top toolbar of the configuration tab, click Save and then Deploy to deploy the resource. A function can be used in Data Studio only after it is deployed.

Step 3: Use the resource or function

  1. Create a Data Studio node that is of the same compute engine type as the compute engine type of the resource or function.

  2. When you edit the Data Studio node, click Resource Management in the left-side navigation pane. In the RESOURCE MANAGEMENT pane, find the desired resource or function, right-click the resource or function name, and then select Reference Resources or Insert Function.

    • After the resource is referenced, a row of code that is in the ##@resource_reference{"Resource name"} format is displayed. For example, if you reference the resource for a PyODPS 3 node, a row of code in the ##@resource_reference{"Resource name"} format is displayed. The display format of the code varies based on the type of the node that references the resource. You can view the code that is displayed in the DataWorks console to learn about the display format.

      Note

      If the running of PyODPS code depends on third-party packages, you must use a custom image to install the required packages in the runtime environment, and then run the PyODPS code in the runtime environment. For more information about custom images, see Manage images.

    • After the function is referenced, the name of the function is displayed in the code on the configuration tab of the Data Studio node.