To use MaxCompute resource files in your code or functions, you must create MaxCompute resources in a workspace or upload existing MaxCompute resources to the workspace before you can reference the resources. You can run MaxCompute SQL commands to upload and manage MaxCompute resources. You can also create MaxCompute resources in the DataWorks console. This topic describes how to create MaxCompute resources and use the resources in nodes in the DataWorks console. This topic also describes how to register functions based on MaxCompute resources.

Prerequisites

  • A compute engine is associated with a DataWorks workspace.

    After you associate a MaxCompute compute engine with a DataWorks workspace on the Workspace Management page, the MaxCompute folder is displayed in DataStudio. For more information, see Create and manage workspaces.

  • A workflow is created.

    DataWorks uses workflows to store resources. Therefore, you must create a workflow before you create resources. For more information, see Create a workflow.

  • A node is created.

    Created resources must be referenced by nodes. You must create a node based on your business requirements before you reference resources in the node. For information about how to create a node, see Create an ODPS SQL node.

Background information

DataWorks allows you to upload text files, Python code, and packages in the format such as .zip, .tgz, .tar.gz, .tar, or .jar to MaxCompute as different types of resources. The resources can be read and used by user-defined functions (UDFs) and MapReduce jobs. Descriptions of the different types of resources:
  • Python: the Python code that you write. You can use Python code to register Python UDFs.
  • JAR: a compiled JAR package that is used to run Java programs.
  • Archive: a package. You can determine the compression type of a package based on the file name extension. Packages in the following formats are supported: .zip, .tgz, .tar.gz, .tar, and .jar.
  • File: a file resource. File resources in the following formats are supported: .zip, .so, and .jar.
    Note If you want to upload a resource file whose size is greater than 500 KB from your on-premises machine when you create a file resource, you can select Large File (> 500 KB).
Procedure for creating and using resources:
  1. Create resources or upload existing resources
  2. Enable a node to use the resource
  3. Use resources to register functions
  4. Other resource management operations
You can also run commands to view compute engine resources or add compute engine resources to DataStudio for management. For more information, see Appendix 1: View resources used by a compute engine by using commands and Appendix 2: Add compute engine resources to DataWorks for management.

Limits

  • Resource size

    You can directly upload a resource whose size is a maximum of 200 MB to DataWorks. For more information, see Manage MaxCompute resources.

  • Resource deployment
    If you use a workspace in standard mode, you need to deploy resources to the production environment. This way, the resources can be used by projects in the production environment.
    Note The information about a compute engine varies based on the environment of the workspace with which the compute engine is associated. You must be clear about the information about the compute engine that is associated with the workspace in the environment. This ensures that you can query valid table and resource data in subsequent operations. For information about the information about a compute engine that is associated with a workspace in a specific environment, see Associate a MaxCompute compute engine with a workspace.
  • Resource management

    DataWorks allows you to view and manage resources that are uploaded by using the DataWorks console only in the DataWorks console. If you add resources to a MaxCompute compute engine by using other tools such as MaxCompute Studio, you must use the MaxCompute Resources feature in DataWorks DataStudio to manually load the resources to DataWorks. You can view and manage the resources in DataWorks after the loading is complete. For more information, see Manage MaxCompute resources.

Create resources or upload existing resources

DataWorks allows you to create resources or upload existing resources. You can select a method based on the GUIs for each type of resource.

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where the workspace resides. On the Workspaces page, find the workspace in which you want to create tables, and click DataStudio in the Actions column.
  2. Create a resource.
    You can create the desired type of resource in the desired workflow based on your business requirements. The following figure shows the entry points of creating resources and the creation procedures. Create a resource
    Note If no workflow is available, create one. For information about how to create a workflow, see Create a workflow.
  3. Configure information about the resource.
    Configure information about the resource based on your business requirements. In this topic, a Python resource is created. Configuration items vary based on the type of the resource that you create. Resource information
    Note
    • If you create a JAR resource and the JAR resource is never uploaded to the MaxCompute client, you must select Upload to MaxCompute. If the JAR resource has been uploaded to the MaxCompute client, clear Upload to MaxCompute. Otherwise, an error is reported when you upload the JAR resource.
    • The resource name can be different from the name of the uploaded file.
    • The name of a JAR resource must end with .jar. The name of a Python resource must end with .py.
  4. Optional:Write code for the resource.
    If you upload an existing resource, skip this step.
    The following code provides an example of the code that is written in the Python resource. You can replace the code based on your business requirements.
    from odps.udf import annotate
    @annotate("string->bigint")
    class ipint(object):
        def evaluate(self, ip):
            try:
                return reduce(lambda x, y: (x << 8) + y, map(int, ip.split('.')))
            except:
                return 0
  5. Commit and deploy the resource.
    Click the Commit icon in the top toolbar to commit the resource to the development environment.
    Note If nodes in the production environment need to use this resource, you also need to deploy the resource to the production environment. For more information, see Deploy nodes.

Enable a node to use the resource

After you create the resource in the DataWorks console, the resource must be referenced by a node. After the resource is referenced, the code in the @resource_reference{"Resource name"} format is displayed. The display format of the code varies based on the type of the node that references the resource. For example, the code in the ##@resource_reference{"Resource name"} format is displayed if a PyODPS 2 node references the resource.
Note If no node is available, create one. For information about how to create a node, see Create an ODPS SQL node.
The following figure shows the reference steps. Load a resource

Use resources to register functions

Before you use resources to register functions, you must create a MaxCompute function by referring to Create and use a MaxCompute function. On the function configuration tab, you must enter the name of the desired resource, as shown in the following figure. Use resources to register functions

For information about MaxCompute built-in functions, see Functions.

For information about how to view the functions in a MaxCompute compute engine and the change history of the functions, and perform other operations, see MaxCompute functions.

Other resource management operations

Right-click the name of the desired resource to perform other operations on the resource. Resource management operationsNotes:
  • Delete a resource

    Only the resource that is used by the compute engine that is associated with a workspace in the development environment can be deleted. If you want to delete the resource from the workspace in the production environment, you must deploy the resource deletion operation in the development environment to the production environment to make the deletion take effect in the production environment. After the operation is deployed, the resource can be deleted from the workspace in the production environment. For more information, see Deploy nodes.

  • Compare resource versions and roll back a resource version
    You can click Versions to view the saved or committed resource versions, and compare the changes on the resource between different versions.
    Note When you compare resource versions, you must select at least two versions for comparison.
    Compare versions

Appendix 1: View resources used by a compute engine by using commands

The following table describes the common commands that are used to view resources.
CommandDescription
list resources;Views all resources in a compute engine in the development environment.
use Name of a compute engine in the production environment;list resources;Views all resources in a compute engine in the production environment.
desc resource <resource_name>;Views the details of a specified resource.
For more commands, see Resource operations.

Appendix 2: Add compute engine resources to DataWorks for management

You can use the MaxCompute Resources feature in DataStudio to load a MaxCompute compute engine resource whose size is no more than 200 MB to DataWorks for visualized management. For more information, see Manage MaxCompute resources.