This topic describes how to create, reference, and download a JAR or Python resource.

Prerequisites

A MaxCompute compute engine instance is associated with the desired workspace. The MaxCompute folder is displayed on the DataStudio page only after you associate a MaxCompute engine instance with the workspace on the Workspace Management page. For more information, see Configure a workspace.

Background information

If your code or function requires resource files such as .jar files, you can upload resources to your workspace and reference them.
If the existing built-in functions do not meet your requirements, you can create user-defined functions (UDFs) and customize processing logic. You can upload the required JAR packages to your workspace so that you can reference them when you create UDFs.
Note
  • You can view built-in functions in the Built-In Functions pane. For more information, see Functions.
  • You can view the UDFs that you have committed or deployed in DataWorks in the MaxCompute Functions pane. For more information, see MaxCompute functions.

You can upload different types of resources, such as text files, Python code, and compressed packages in the .zip, .tgz, .tar.gz, .tar, and .jar formats to MaxCompute. You can read or use these resources when you run UDFs or MapReduce.

MaxCompute provides API operations for you to read and use resources. The following types of resources are supported:
  • Python: the Python code you have written. You can use Python code to register Python UDFs.
  • JAR: the compiled Java JAR packages.
  • Archive: the compressed files that can be identified by the file name extension. Supported file types include .zip, .tgz, .tar.gz, .tar, and .jar.
  • File: the files in the .zip, .so, or .jar format.
JAR resources and file resources have the following differences:
  • You can write Java code in an offline Java environment, compress the code to a JAR package, and then upload the package as a JAR resource to DataWorks.
  • You can create and edit a small-sized file resource in the DataWorks console.
  • If you want to upload a resource file whose size is greater than 500 KB from your on-premises machine when you create a file resource, you can select Large File (over 500 KB).
    Note You can upload a resource file whose size is no more than 50 MB. If you want to upload a resource file whose size is greater than 50 MB, you can use the MaxCompute client to perform this operation. Then, commit the resource file to DataWorks in the MaxCompute Resources pane. For more information, see MaxCompute resources.

Create a JAR resource

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides, find the workspace, and then click Data Analytics in the Actions column.
  2. On the DataStudio page, move the pointer over the Create icon and choose MaxCompute > Resource > JAR.
    Alternatively, you can click the name of the desired workflow in the Business Flow section, right-click MaxCompute, and then choose Create > Resource > JAR.

    For more information about how to create a workflow, see Create a workflow.

  3. In the Create Resource dialog box, configure the Resource Name and Location parameters.
    Note
    • If the selected JAR package has been uploaded from the MaxCompute client, clear Upload to MaxCompute. If you do not clear it, an error occurs during the upload process.
    • The resource name can be different from the name of the uploaded file.
    • The resource name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.). The name is not case-sensitive. A JAR resource name must end with .jar, and a Python resource name must end with .py.
  4. Click Upload and select the file that you want to upload.
  5. Click Create.
  6. Click the Commit icon in the top toolbar to commit the resource to the development environment.
    If the workspace that you use is in standard mode, you must click Deploy in the upper-right corner to deploy the resource after you commit the resource. For more information, see Deploy nodes.

Create a Python resource and register a UDF

  1. Create a Python resource.
    1. On the DataStudio page, move the pointer over the Create icon and choose MaxCompute > Resource > Python.
      Alternatively, you can click the name of the desired workflow in the Business Flow section, right-click MaxCompute, and then choose Create > Resource > Python.
    2. In the Create Resource dialog box, configure the Resource Name and Location parameters.
      Notice The resource name can contain letters, digits, periods (.), underscores (_), and hyphens (-). It must end with .py.
    3. Click Create.
    4. Write code for the created Python resource in the code editor. Sample code:
      from odps.udf import annotate
      @annotate("string->bigint")
      class ipint(object):
          def evaluate(self, ip):
              try:
                  return reduce(lambda x, y: (x << 8) + y, map(int, ip.split('.')))
              except:
                  return 0
    5. Click the Commit icon in the top toolbar to commit the resource.
      If the workspace that you use is in standard mode, you must click Deploy in the upper-right corner to deploy the resource after you commit the resource. For more information, see Deploy nodes.
  2. Register a function.
    1. On the DataStudio page, move the pointer over the Create icon and choose MaxCompute > Function.
      Alternatively, you can click the required workflow, right-click MaxCompute, and then choose Create > Function.
    2. In the Create Function dialog box, configure the Function Name and Location parameters.
    3. Click Create.
    4. In the Register Function section of the configuration tab that appears, enter the class name of the function in the Class Name field and the name of the Python resource that you created in the Resources field, and then click the Commit icon in the top toolbar. In this example, the class name is ipint.ipint.
    5. Check whether the ipint function is valid and meets your expectation. You can create an ODPS SQL node on the DataStudio page to test the ipint function by executing an SQL statement.
    You can also create an ipint.py file on your on-premises machine and upload the file by using the MaxCompute client. For more information about how to use the MaxCompute client, see MaxCompute client.
    odps@ MaxCompute_DOC>add py D:/ipint.py;
    OK: Resource 'ipint.py' have been created.                
    odps@ MaxCompute_DOC>create function ipint as ipint.ipint using ipint.py;
    Success: Function 'ipint' have been created.           

    After the resource file is uploaded, you can register a UDF on the MaxCompute client. For more information, see Functions operations. You can use the UDF after it is registered.

Reference and download a resource

If you want to download a resource, find the desired resource in the Resource folder, right-click the resource name, and then select View Versions In the Versions dialog box, click Download. For more information about how to download a resource by using the MaxCompute client, see Resource operations.

Other operations

After you create a resource, you can rename, reference, or delete the resource in the MaxCompute folder of the desired workflow. For more information about how to delete a resource, see DataStudio.