This topic describes how to create, upload, reference, and download resources.

If your code or function requires resource files such as.jar files, you can upload resources to your workspace and reference them.

If the existing built-in functions do not meet your requirements, DataWorks allows you to create user-defined functions (UDFs) and customize processing logic. You can upload the required JAR packages to your workspace so that you can reference them when creating UDFs.
Note
  • You can view built-in functions on the Built-In Functions page. For more information, see Functions.
  • You can view the UDFs that you have committed or deployed on the MaxCompute Functions page. For more information, see MaxCompute functions.

MaxCompute resources

The resources that you can upload to MaxCompute include text files, MaxCompute tables, Python code, and compressed packages in .zip, .tgz, .tar.gz, .tar, and .jar formats. You can read or use these resources when running UDFs or MapReduce.

MaxCompute provides API operations for you to read and use resources. Currently, the following types of MaxCompute resources are available:
  • Python: the Python code you have written. You can use Python code to register Python UDFs.
  • JAR: the compiled Java JAR packages.
  • Archive: the compressed files that can be identified by the resource name extension. Supported file types include .zip, .tgz, .tar.gz, .tar, and .jar.
  • File
JAR resources and file resources have the following differences:
  • To create a JAR resource, write Java code in the offline Java environment, compress the code to a JAR package, and upload the package as the JAR resource to DataWorks.
  • To create a file resource that is smaller than or equal to 500 KB in size, you can directly create and edit it in the DataWorks console.
  • To create a file resource that is larger than 500 KB in size, you can select Larger than 500 KB and upload a local file.
    Note Each resource file to be uploaded in the DataWorks console cannot exceed 30 MB. You can use the MaxCompute client to upload a resource file that is larger than 30 MB. Then, commit it to DataWorks on the MaxCompute Resources page. For more information, see .MaxCompute resources

E-MapReduce resources

The E-MapReduce resources that you can upload to DataWorks include JAR and file resources.

Note The EMR module is available on the DataStudio page only after you bind an E-MapReduce computing engine to the current workspace on the Workspace Management page. For more information, see Configure a workspace.

Each resource file to be uploaded cannot exceed 30 MB.

Log on to the DataWorks console. In the left-side navigation pane, click Workspaces. On the Workspaces page, find the target workspace and click Data Analytics in the Actions column.

Move the pointer over the Create icon, choose EMR > Resource, and then select EMR JAR or EMR File. The procedure for creating an E-MapReduce resource is the same as that for creating a JAR resource.

Create a JAR resource

  1. On the DataStudio page, create a JAR resource.
    You can create a JAR resource in either of the following ways:
    • Move the pointer over the Create icon and choose MaxCompute > Resource > JAR.
    • Find the target workflow, click MaxCompute, right-click Resource, and choose Create > JAR.
  2. In the Create Resource dialog box that appears, enter the resource name and select the target folder. Then, click Upload and select the target file to upload.
    Note
    • If the selected JAR package has been uploaded from the MaxCompute client, clear Upload to MaxCompute. If you do not clear it, an error will occur during the upload process.
    • The resource name can be different from the name of the uploaded file.
    • A resource name can contain letters, digits, underscores (_), and periods (.), and is case-insensitive. It must be 1 to 128 characters in length. A JAR resource name must end with .jar, and a Python resource name must end with .py.
  3. Click OK to create the JAR resource.
  4. Click the Commit icon in the toolbar to commit the resource to the development environment.
  5. Deploy the node.

    For more information, see Deploy a node.

Create a Python resource and register a UDF

  1. Create a Python resource.
    1. On the DataStudio page, move the pointer over the Create icon and choose MaxCompute > Resource > Python.

      You can also find the target workflow, right-click MaxCompute, and choose Create > Resource > Python.

    2. In the Create Resource dialog box that appears, enter the resource name and select the target folder. Then, click Upload and select the target file to upload.
    3. Click OK.
    4. On the page that appears, edit the code of the created resource. Sample code:
      from odps.udf import annotate
      @annotate("string->bigint")
      class ipint(object):
          def evaluate(self, ip):
              try:
                  return reduce(lambda x, y: (x << 8) + y, map(int, ip.split('.')))
              except:
                  return 0
    5. Click the Commit icon in the toolbar to commit the Python resource.
  2. Register a UDF.
    1. On the DataStudio page, move the pointer over the Create icon and choose MaxCompute > Function.

      You can also find the target workflow, right-click MaxCompute, and choose Create > Function.

    2. In the Create Function dialog box that appears, enter the function name and select the target folder.
    3. Click Commit.
    4. On the Register Function page that appears, enter the class name and the name of the Python resource that has been created, and then click the Commit icon in the toolbar. In this example, the class name is ipint.ipint.
    5. Check whether the ipint function is valid and meets your expectation. For example, you can create an ODPS SQL node to test the ipint function by running an SQL statement, as shown in the following figure.Verification
You can also create an ipint.py file on your local device and upload it by using the MaxCompute client. For more information, see Client.
odps@ MaxCompute_DOC>add py D:/ipint.py;
OK: Resource 'ipint.py' have been created.                
odps@ MaxCompute_DOC>create function ipint as ipint.ipint using ipint.py;
Success: Function 'ipint' have been created.           

After the resource file is uploaded, register a UDF on the MaxCompute client. For more information, see Functions operations.

You can use the UDF after it is registered.

Reference and download resources

  • For more information about how to reference resources for functions, see Function.
  • For more information about how to reference resources for nodes, see ODPS MR node.

To download a resource, double-click Resource under the target workflow. In the resource list that appears, move the pointer over the required resource and click Download.