Data Studio lets you manage resources in a MaxCompute project. Upload resource files from your local machine or Object Storage Service (OSS), then register them as user-defined functions (UDFs) for use in data development nodes and SQL queries.
Prerequisites
Before you begin, ensure that you have:
Attached a MaxCompute compute resource to your workspace
Prepared the resource files to upload (from your local machine or an OSS bucket)
If you upload from OSS, also ensure that:
OSS is activated, a bucket is created, and your resource files are stored in that bucket. See Create a bucket and Simple upload
The Alibaba Cloud account used for the upload has access to the target bucket. Grant the required permissions before uploading
Go to Resource Management
Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a region. Find the workspace and choose Shortcuts > Data Studio in the Actions column.
In the navigation pane on the left, click the Resource Management icon
to open the Resource Management page.Click the
icon to create a resource or function. To organize resources first, click Create Directory, then right-click the target folder and select the resource or function type to create.
Create and manage resources
Resource types
Resources are the foundation for running user-defined functions (UDFs) and MapReduce jobs in MaxCompute. Data Studio provides a visual interface to upload resources from your local machine or from OSS. Uploaded resources can be read during UDF and MapReduce execution.
Uploading resources to MaxCompute through DataWorks incurs MaxCompute storage fees.
The following resource types are supported:
| Resource type | Description | File extension |
|---|---|---|
| Python | Python code for registering Python UDFs | .py |
| JAR | Compiled Java JAR package for running Java programs | .jar |
| Archive | Compressed files only | .zip, .tgz, .tar.gz, .tar, .jar |
| File | Any file type; actual usage depends on the engine | Any |
Limitations
| Constraint | Limit |
|---|---|
| Online editing — Python | Up to 10 MB |
| Online editing — File | Up to 500 KB |
| Upload from local machine | Up to 500 MB |
| Upload from OSS | Up to 500 MB |
Publishing: In a standard mode workspace, publish the resource to the production environment before it takes effect.
Data source configuration may differ between the development and production environments. Before querying tables or resources in an environment, confirm the data source for that environment.
Visibility: DataWorks displays and manages only resources uploaded through its interface. Resources uploaded by other means are not visible in the resource list, but can be viewed using the methods described in View all resources in a project.
Create a resource
MaxCompute resources can be uploaded from your local machine or from OSS. After creation, reference them directly in data development nodes or register them as functions.
On the Resource Management page, in the Create Resource or Function dialog box, set the Type, Path, and Name of the resource.
Configure the file source:
Parameter Description File Source Select On-premises to upload from your local machine, or OSS to reference a file in an OSS bucket. File Content If On-premises: click Upload to select a local file. If OSS: select the file from the drop-down list. Data Source Select the MaxCompute data source the resource belongs to. Click Save in the toolbar, then click Publish. Only published resources can be used in Data Development.
Use a resource
After creating a resource, reference it in a data development node:
In the navigation pane on the left, click Resource Management.
Find the target resource, right-click it, and select Reference Resources.
The following reference code is added to the node:
##@resource_reference{"Resource Name"}For example, a PyODPS 3 node displays:
##@resource_reference{"example.py"}The exact format varies by node type. Alternatively, register the resource as a function and call it in a development node.
Manage resources
On the Resource Management page, click a resource to perform the following operations:
View historical versions: Compare published resource versions to track changes. Select at least two versions to enable comparison.
Delete a resource: Deletes the resource from the development environment. To remove it from the production environment, publish the change. After the publish task succeeds, the resource is deleted from both environments. See Publish a task.
View all resources in a project
DataWorks only shows resources uploaded through its interface. To view all resources in a MaxCompute project — including those uploaded by other means — use one of the following methods:
Data catalog: After adding a MaxCompute project to the data catalog, open the corresponding MaxCompute folder and browse the resource directory.
MaxCompute SQL node: Run the following commands in a MaxCompute SQL node. By default, the command accesses the compute resource attached to the development environment. View all resources in the current project:
list resources;View all resources in a specific project:
use MaxCompute_project_name; list resources;For a full list of resource commands, see Resource operations.
Create and manage functions
Before creating a function, create and publish a resource first.
For guidance on preparing the MaxCompute resource file, see UDF Development (Java) and UDF Development (Python 3).
When to use a UDF vs. a built-in function
Use a UDF when the processing logic your workflow requires cannot be expressed using existing built-in functions. For standard operations like aggregation, date arithmetic, and string manipulation, use MaxCompute built-in functions instead.
Function types
Data Studio supports the following function types. Select the type that matches the behavior of your UDF:
| Function type | Behavior |
|---|---|
| MATH | Mathematical operations on numeric values |
| AGGREGATE | Operates on multiple rows and returns a single value |
| STRING | String processing and transformation |
| DATE | Date and time operations |
| ANALYTIC | Window functions that compute results across a set of rows related to the current row |
| OTHER | Functions that don't fit the above categories |
Create a function
On the Resource Management page, in the Create Resource or Function dialog box, set the Type, Path, and Name of the function.
Configure the function:
Parameter Description Function Type Select the function type. See Function types. Class Name The entry point class for the UDF, in resource_name.class_nameformat. See Class Name format.Type Select Resource Function or Embedded Function. A Resource Function references an uploaded resource file. An Embedded Function includes inline code — configure Language ( JAVA,PYTHON2, orPYTHON3) and Code in addition to the resource.Resource List Select the resources to associate with the function. In Visual Mode, choose from resources already uploaded to DataWorks. In Code Editor mode, enter resource names manually — separate multiple resources with commas. For resources not managed by DataWorks (such as table resources or those uploaded directly to MaxCompute), use Code Editor mode. Do not include the resource file path — the name alone is sufficient. Command Syntax An example showing how to call the UDF. Click Save in the toolbar, then click Publish. Only published functions are available in Data Development.
Class Name format
The Class Name identifies the entry point of the UDF. The format depends on the resource type:
JAR resource — use packageName.ActualClassName:
com.aliyun.odps.examples.udf.UDAFExampleGet this value from IntelliJ IDEA using Copy Reference. After the function is published under a name (set in the Name field), call it by that name in SQL:
SELECT your_function_name(column_name) FROM table;Python resource — use PythonResourceName.ActualClassName:
LcLognormDist_sh.LcLognormDist_shAfter the function is published, call it by its registered name in SQL:
SELECT your_function_name(column_name) FROM table;Do not add the .jar or .py suffix when entering the resource name. The resource must be published before it can be referenced.
Use a user-defined function
After a function is published, use it in data development nodes or SQL queries:
In a data development node: In the navigation pane on the left, click Resource Management, find the function, right-click it, and select Insert Function. The function name is inserted into the editor, for example,
example_function().In an SQL query: Call the function directly by name:
SELECT example_function(column_name) FROM table;
Use built-in functions
MaxCompute provides a set of built-in functions covering math, string, date, aggregate, and window operations.
Browse built-in functions in one of the following ways:
Run the following command in a MaxCompute SQL node:
show builtin functions [<function_name>];Replace
<function_name>with the name of a specific built-in function to filter results. If you run this command using the MaxCompute client (odpscmd), version 0.43.0 or later is required.
Usage notes and limits:
Typical use cases:
Troubleshooting:
Manage functions
On the Resource Management page, click a function to perform the following operations:
View historical versions: Click the Version button on the right side of the function editing page. Compare saved or published versions to track changes. Select at least two versions to enable comparison.
Delete a function: Right-click the function and select Delete. To remove it from the production environment, publish the change. See Publish a task.
To inspect user-defined functions via SQL, run the following commands in a MaxCompute SQL node:
View all UDFs registered in the attached MaxCompute project:
SHOW FUNCTIONS;View the details of a specific UDF:
DESC FUNCTION <function_name>;FAQ
Q: After uploading a resource and registering it as a UDF in DataWorks, can I use it in both DataAnalysis SQL queries and MaxCompute SQL nodes?
Yes. UDFs registered through DataWorks are stored in the MaxCompute project, so they are available in both MaxCompute SQL nodes and in DataAnalysis SQL Query (Legacy).