This topic describes how to create, reference, and download a JAR or Python resource.
Prerequisites
A MaxCompute compute engine instance is associated with the desired workspace. The
MaxCompute folder is displayed on the DataStudio page only after you associate a MaxCompute
engine instance with the workspace on the
Workspace Management page. For more information, see
Configure a workspace.
Background information
If your code or function requires resource files such as .jar files, you can upload resources to your workspace and reference them.
If the existing built-in functions do not meet your requirements, you can create user-defined
functions (UDFs) and customize processing logic. You can upload the required JAR packages
to your workspace so that you can reference them when you create UDFs.
Note
- You can view built-in functions in the Built-In Functions pane. For more information, see Functions.
- You can view the UDFs that you have committed or deployed in DataWorks in the MaxCompute Functions pane. For more information, see MaxCompute functions.
You can upload different types of resources, such as text files, Python code, and
compressed packages in the .zip, .tgz, .tar.gz, .tar, and .jar formats to MaxCompute. You can read or use these resources when you run UDFs or MapReduce.
MaxCompute provides API operations for you to read and use resources. The following
types of resources are supported:
- Python: the Python code you have written. You can use Python code to register Python UDFs.
- JAR: the compiled Java JAR packages.
- Archive: the compressed files that can be identified by the file name extension. Supported
file types include .zip, .tgz, .tar.gz, .tar, and .jar.
- File: the files in the .zip, .so, or .jar format.
JAR resources and
file resources have the following differences:
- You can write Java code in an offline Java environment, compress the code to a JAR
package, and then upload the package as a JAR resource to DataWorks.
- You can create and edit a small-sized file resource in the DataWorks console.
- If you want to upload a resource file whose size is greater than 500 KB from your
on-premises machine when you create a file resource, you can select Large File (> 500 KB).
Note You can upload a resource file whose size is no more than 50 MB. If you want to upload
a resource file whose size is greater than 50 MB, you can use the MaxCompute client
to perform this operation. Then, commit the resource file to DataWorks in the MaxCompute
Resources pane. For more information, see
MaxCompute resources.
Create a JAR resource
- Go to the DataStudio page.
- Log on to the DataWorks console.
- In the left-side navigation pane, click Workspaces.
- In the top navigation bar, select the region where the required workspace resides,
find the workspace, and then click Data Analytics.
- On the DataStudio page, move the pointer over the
icon and choose . Alternatively, you can click the name of the desired workflow in the
Business Flow section, right-click
MaxCompute, and then choose .
For more information about how to create a workflow, see Create a workflow.
- In the Create Resource dialog box, configure the Resource Name and Location parameters.
Note
- If the selected JAR package has not been uploaded from the MaxCompute client, select
Upload to MaxCompute. If you do not select it, an error occurs during the upload process. If the selected
JAR package has been uploaded from the MaxCompute client, clear Upload to MaxCompute. If you do not clear it, an error occurs during the upload process.
- The resource name can be different from the name of the uploaded file.
- The resource name must be 1 to 128 characters in length and can contain letters, digits,
underscores (_), and periods (.). The name is not case-sensitive. A JAR resource name
must end with .jar, and a Python resource name must end with .py.
- Click Upload and select the file that you want to upload.
- Click Create.
- Click the
icon in the top toolbar to commit the resource to the development environment. If the workspace that you use is in standard mode, you must click Deploy in the upper-right
corner to deploy the resource after you commit the resource. For more information,
see
Deploy nodes.
Create a Python resource and register a UDF
- Create a Python resource.
- On the DataStudio page, move the pointer over the
icon and choose . Alternatively, you can click the name of the desired workflow in the Business Flow section, right-click MaxCompute, and then choose .
- In the Create Resource dialog box, configure the Resource Name and Location parameters.
Notice The resource name can contain letters, digits, periods (.), underscores (_), and hyphens
(-). It must end with .py.
- Click Create.
- Write code for the created Python resource in the code editor. Sample code:
from odps.udf import annotate
@annotate("string->bigint")
class ipint(object):
def evaluate(self, ip):
try:
return reduce(lambda x, y: (x << 8) + y, map(int, ip.split('.')))
except:
return 0
- Click the
icon in the top toolbar to commit the resource. If the workspace that you use is in standard mode, you must click Deploy in the upper-right
corner to deploy the resource after you commit the resource. For more information,
see
Deploy nodes.
- Register a function.
- On the DataStudio page, move the pointer over the
icon and choose . Alternatively, you can click the required workflow, right-click MaxCompute, and then choose .
- In the Create Function dialog box, configure the Function Name and Location parameters.
- Click Create.
- In the Register Function section of the configuration tab that appears, enter the class name of the function
in the Class Name field and the name of the Python resource that you created in the
Resources field, and then click the
icon in the top toolbar. In this example, the class name is ipint.ipint
.
- Check whether the ipint function is valid and meets your expectation. You can create an ODPS SQL node on
the DataStudio page to test the ipint function by executing an SQL statement.
You can also create an ipint.py file on your on-premises machine and upload the file
by using the MaxCompute client. For more information about how to use the MaxCompute
client, see
MaxCompute client.
odps@ MaxCompute_DOC>add py D:/ipint.py;
OK: Resource 'ipint.py' have been created.
odps@ MaxCompute_DOC>create function ipint as ipint.ipint using ipint.py;
Success: Function 'ipint' have been created.
After the resource file is uploaded, you can register a UDF on the MaxCompute client.
For more information, see Functions operations. You can use the UDF after it is registered.
Reference and download a resource
If you want to download a resource, find the desired resource in the Resource folder, right-click the resource name, and then select View Versions. In the Versions
dialog box, click Download. For more information about how to download a resource by using the MaxCompute client,
see Resource operations.
Other operations
After you create a resource, you can rename, reference, or delete the resource in
the MaxCompute folder of the desired workflow. For more information about how to delete a resource,
see DataStudio.