DataStudio lets you manage MaxCompute project resources. You can create resources from local files or Object Storage Service (OSS) files and register them as functions for use in Data Development nodes. This topic describes how to create various types of MaxCompute resources and functions in Resource Management.
Prerequisites
You have attached a MaxCompute computing resource.
You have developed the resource files. You can upload the files from your local machine or retrieve them from OSS. If you create resources by uploading files from OSS, you must meet the following conditions.
You have activated OSS, created a bucket, and stored the resource files that you want to upload in the OSS bucket. For more information, see Create a bucket and Simple upload.
NoteFor more information about the supported resource files, see Resource description.
The Alibaba Cloud account that you use to upload the file must be granted permissions to access the target bucket. To avoid permission restrictions, you must authorize the account before you upload the file.
Go to Resource Management
Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose in the Actions column.
In the navigation pane on the left, click the Resource Management icon
to open the Resource Management page.On the Resource Management page, click the
icon to create a new resource or function. Alternatively, you can first click New Folder to plan your folder structure. Then, right-click the target folder, select New, and choose the type of resource or function to create.
Create and manage resources
Resource description
Resources are the foundation for implementing user-defined function (UDF) or MapReduce features in MaxCompute. In DataStudio, you can visually upload resources that are stored locally or in OSS. These resources can be read and used during the execution of UDFs and MapReduce. The following MaxCompute resource types are supported.
Uploading resources to MaxCompute using DataWorks incurs MaxCompute storage fees.
Resource type | Description | Supported upload methods | |
Local | OSS | ||
Python | Stores Python code used to register Python UDFs. The file name extension is |
|
|
JAR | A compiled Java JAR package used to run Java programs. The file name extension is | ||
Archive | Only compressed files such as | ||
File | When you create a resource of the | ||
Limits
Note the following limits when you upload resources:
Resource size: You can upload a resource of up to 500 MB.
Resource publishing: If you use a standard mode workspace, you must publish the resource to the production environment. After the resource is published, it becomes available to projects in the production environment.
NoteThe data source information might differ between the development and production environments. Before you query tables or resources in an environment, confirm the data source information for that environment.
Resource management: In DataWorks, you can view and manage only the resources that you upload using DataWorks.
Create resources
MaxCompute resources can be uploaded from your local machine or from OSS. You can directly reference the created resources in Data Development or create functions from them.
On the Resource Management page, in the Create Resource And Function dialog box that appears, configure the Type, Path, and Name of the resource.
Upload a local file or an OSS file as the source. The following table describes the key parameters for uploading resources.
Configuration item
Configuration description
File Source
The source of the object file. The supported sources are Local and OSS.
File Content
If you select Local, in Upload File, click Click To Upload to upload a local file.
If you select OSS, select the corresponding OSS file from the Select File drop-down list.
Data Source
Select the data source to which the uploaded MaxCompute resource belongs.
In the top toolbar, click Save and then Publish the resource. Only published resources can be used in Data Development.
Use resources
After you create a resource, you can edit a data development node. In the navigation pane on the left, click Resource Management, find the target resource or function, right-click it, and select Reference Resource. After the resource is successfully referenced, code in the ##@resource_reference{"Resource Name"} format is displayed.
For example, a PyODPS 3 node displays the code as ##@resource_reference{"example.py"}. The display format varies by node type. The format displayed on the UI is the one that is used.
Alternatively, you can create a function from a resource and then use it in a developer node.
Manage resources
In DataWorks, you can view and manage only the resources that you upload using the visualization feature. On the Resource Management page, click a resource to perform management operations.
View historical versions: View and compare published resource versions to see the changes between them.
NoteTo compare versions, you must select at least two versions.
Delete resource: This operation deletes the resource only from the corresponding project in the development environment. To delete the resource from the production environment, you must publish the deletion operation. After the operation is successfully published, the resource is also deleted from the production environment. For more information, see Publishing tasks.
View other resources.
MaxCompute might contain resources uploaded using methods other than DataWorks. You can view these resources in the following ways.
View all resources in a MaxCompute project using the data catalog.
After you add a MaxCompute project to the data catalog, you can open the corresponding MaxCompute folder in the data catalog and view all resources in the current project under the resource directory.
Use a MaxCompute SQL node to view other resources in a MaxCompute project.
View all resources in the current project. When you create a MaxCompute SQL script in Data Development and execute this command, the system accesses the MaxCompute computing resource that is attached to the development environment by default.
list resources;View all resources in a specified project.
use MaxCompute_project_name; list resources;
For more information about command operations, see Resource operations.
Create and manage functions
Before you create a function, ensure that you have created a resource.
When you create a MaxCompute resource, you can refer to UDF Development (Java) and UDF Development (Python 3) to prepare the MaxCompute resource file.
Function description
In DataStudio Resource and Function Management, you can register resources as functions. In data development or SQL queries, you can create a function directly from an uploaded and published resource, create embedded functions using JAVA, PYTHON2, or PYTHON3, or directly use MaxCompute built-in functions.
Create a function
On the Resource Management page, in the New Resource And Function dialog box that appears, configure the Type, Path, and Name of the function.
Create a function resource and configure its information.
Before you configure a MaxCompute function, make sure that you have registered the MaxCompute project as a computing resource in DataWorks and have uploaded a MaxCompute resource. The following table describes the key parameters for a MaxCompute function.
Parameter
Description
Function Type
Select the function type. Valid values: MATH (mathematical operation function), AGGREGATE (aggregate function), STRING (string processing function), DATE (date function), ANALYTIC (window function), and OTHER (other function).
Class Name
The class name of the UDF, in
resource_name.class_nameformat. The resource name can be a Java package name or a Python resource name.When you create a user-defined function in DataWorks, you can use MaxCompute resources of the JAR and Python types. The class name is configured differently for different resource types:
When the resource type is JAR, the Class Name must be in the
Java package name.actual class nameformat. You can obtain this value fromIntelliJ IDEAusing theCopy Referencestatement.For example, if
com.aliyun.odps.examples.udfis the Java package name andUDAFExampleis the actual class name, the Class Name parameter is set tocom.aliyun.odps.examples.udf.UDAFExample.When the resource type is Python, the format for the Class Name is
Python resource name.Actual class name.For example, if the Python resource name is
LcLognormDist_shand the actual class name isLcLognormDist_sh, the Class Name parameter is set toLcLognormDist_sh.LcLognormDist_sh.
NoteWhen you enter the resource name, do not add the
.jaror.pysuffix.The resource must be submitted and published before it can be used.
Type
Select Resource Function or Embedded Function:
If you select Resource Function, you only need to configure the Resource List.
When you select embedded function, in addition to selecting the Resources, you also need to configure the Language (
JAVA,PYTHON2, orPYTHON3) and Code.
Resources
Select the resources to be used to register the function.
Visualization mode: You can only select resources that have been uploaded or added to DataWorks.
Script mode: You can enter all resources in the corresponding data source. If the UDF calls multiple resources, separate them with commas (,).
NoteYou do not need to enter the path of the added resources.
For resources that DataWorks does not support uploading through the visualization feature (such as table resources), or resources that have been uploaded to MaxCompute through other methods and are not managed by DataWorks visualization, you can manually enter them in script mode.
Command Format
An example of how to use this UDF.
In the top toolbar, click Save and then Publish the function. Only published functions can be used in Data Development.
Use functions
Use a user-defined function
After a function is created and published, you can directly reference it in Data Development or SQL queries.
When you edit a data development node, click Resource Management in the navigation pane on the left. Then, find the target resource or function, right-click it, and select Reference Function.
After the function is successfully referenced, the name of the custom function is automatically generated on the node editing page. For example:
example_function().When you edit an SQL query, you can directly use the created function.
SELECT example_function(column_name) FROM table;Use a built-in function
DataWorks supports two types of functions: user-defined functions and MaxCompute built-in functions. You can view the built-in functions by type or view them in alphabetical order.
Note: For information about precautions when you use built-in functions, see Precautions.
Limits: For information about the limits on built-in functions, see Limits of JSON functions and Limits of string functions.
You can view built-in functions in one of the following three ways:
Use the following command in a MaxCompute SQL node to view built-in functions.
show builtin functions [<function_name>]; --<function_name> is the name of a specific built-in function.Note<function_name>is a placeholder. Replace it with the name of a built-in function.If you use the MaxCompute client (odpscmd) to run the
show builtin functions;command, the odpscmd version must be 0.43.0 or later.
For typical use cases of built-in functions, see:
If you want to quickly locate problems that you encounter when you use built-in functions, see:
Manage functions
On the Resource Management page, click a function to perform management operations.
View historical versions: Click the Version button on the right side of the function editing page. You can view and compare saved or submitted function versions to see the changes between them.
NoteTo compare versions, you must select at least two versions.
Delete a function: Right-click the target function and select Delete.
To delete the function from the production environment, you must publish a task to apply the deletion. After the task is successfully published, the function is also deleted from the production environment. For more information, see Publish a task.
View the list of user-defined functions
// View the functions in the MaxCompute computing resource project attached to the current DataWorks workspace.
SHOW FUNCTIONS;View the details of a user-defined function
Use the
DESCRIBEcommand or its abbreviationDESC, followed by the function name, to view the details of a user-defined function.// Use the abbreviated form to view the details of a user-defined function DESC FUNCTION <function_name>;In DataWorks, if the required processing logic in your business flow cannot be implemented by existing functions, you can write a MaxCompute user-defined function. You can then upload and associate the corresponding resources, such as JAR packages and Python files, to manage and expand your data processing features.
Appendix: FAQ
Can a UDF, created from a resource uploaded through DataWorks, be used in DataAnalysis SQL queries along with in ODPS SQL nodes within Data Development?
Yes, it can. UDFs registered using DataWorks are stored in a MaxCompute project. Therefore, they can be used in both MaxCompute SQL nodes and in DataAnalysis SQL Query and Analysis.