All Products
Search
Document Center

DataWorks:Shell node

Last Updated:May 08, 2026

The DataWorks shell node is designed for data engineers and supports running standard shell scripts for scenarios such as file operations and data interaction with OSS and NAS. You can configure scheduling parameters, reference resources, securely access OSS using the pre-installed ossutil tool, and run nodes by associating a RAM role.

Permissions

Add the RAM account used for node development to the target workspace and grant it the developer or workspace administrator role. For details, see Add a member to a workspace.

Usage notes

  1. Syntax limitations

    • Standard shell syntax is supported. Interactive syntax is not supported.

  2. Runtime environment and network access

    • When you run a shell node on a serverless resource group, if the target service has an allowlist configured, you must add the IP addresses of the serverless resource group to the allowlist of the target application.

    • When you use a serverless resource group, a single task supports a maximum of 64CU. To prevent resource shortages and task startup delays, do not exceed 16CU.

  3. Extended development environment

    • If your task requires a specific development environment, you can use the custom image feature in DataWorks to build a tailored image for task execution. For more information, see Use custom images.

  4. Resource usage and multi-script calls

    • Avoid launching a large number of child processes within a shell node. Because shell nodes currently have no resource limits, excessive child processes can affect other tasks running on the same scheduling resource group.

    • If a shell node invokes another script, such as a Python script, the shell node waits for the invoked script to complete before the node itself exits.

Quick start

This section walks you through the process of creating, debugging, configuring, and deploying a shell node by using an example that outputs "Hello DataWorks!".

Develop the node

  1. Log on to the DataWorks console. After you switch to the target region, in the navigation pane on the left, click Data Development and O&M > Data Development. From the drop-down list that appears, select the target workspace and click Data Analytics.

  2. On the Data Studio page, create a shell node.

  3. In the script editor, enter standard shell code. Interactive syntax is not supported.

    echo "Hello DataWorks!"
  4. After you write the code, click Debug Configuration in the right-side pane. Select the resource group for debugging and specify other required execution parameters. Then, click the image Run button to trigger local debugging.

  5. After you successfully debug the script, click Scheduling Settings in the right-side pane to configure its scheduling policies, such as the scheduling cycle, dependencies, and parameter settings.

  6. After you configure the schedule settings, you must Save the node before you can proceed.

Deploy and maintain the node

  1. After you configure the schedule, submit and deploy the shell node to the production environment. For more information about the deployment process, see Deploy a node.

  2. After the node is deployed, it runs periodically according to the schedule you configured. You can click the image icon in the upper-left corner. On the navigation page that appears, choose All Products > Data Development and O&M > Operation and Maintenance Center to go to the Operation Center. Then, in the navigation pane on the left, choose Node O&M > Auto Triggered Task O&M > Auto Triggered Task to view your scheduled tasks. For detailed information, see Get started with Operation Center.

Advanced usage

Reference resources

  1. You can use the resource management feature in DataWorks to upload resources for your shell nodes. For more information, see Manage resources.

    Note

    You must publish resources before nodes can reference them. If a production task requires a resource, you must also publish the resource to the production environment.

  2. Open the shell node to access its script editor.

  3. Click image in the navigation pane on the left to open the resource management menu, and then find the resource that you want to reference. Right-click the resource and select Reference Resource to use it in your shell script.

    Note
    • After you reference the resource, the system automatically inserts the declaration comment ##@resource_reference{resource_name} at the top of the script.

    • This comment is a required identifier that allows DataWorks to recognize resource dependencies and automatically mount the corresponding resource to the execution environment at runtime. Do not modify or delete this comment.

Use scheduling parameters

Scheduling parameters are passed to a shell script as positional parameters. Custom variable names are not supported. DataWorks passes the parameter values that you configure in the Scheduling Settings > Scheduling Parameters section to the script in sequential order: $1, $2, $3, and so on. If you have more than nine parameters, use braces, such as ${10} and ${11}, to ensure correct parsing. Separate multiple parameter values with spaces. The order of the values you provide must match the reference order in the script ($1, $2, and so on).

For example:

  • The parameter $1 is assigned the current date: $[yyyymmdd].

  • The parameter $2 is assigned the fixed value: Hello DataWorks.

Note
  • If a parameter value contains spaces, enclose the value in quotation marks. This treats the entire quoted string as a single parameter.

  • To use the output of an upstream node as an input for the current node, go to Scheduling Settings > Node Context Parameters > Input Parameters. Then, add an input parameter and set its value to the output parameter of the upstream node.

Access OSS using ossutil

DataWorks shell nodes provide native support for ossutil, the command-line tool for Alibaba Cloud OSS. You can use it for tasks such as bucket management, file uploads and downloads, and batch operations. To use ossutil to access OSS, you can provide access credentials using either command-line parameters or a configuration file.

In the latest version of Data Studio, shell nodes provide an enhanced security option. You can associate a RAM role with the node. This allows the node to use Alibaba Cloud Security Token Service (STS) to dynamically obtain temporary security credentials for the role. This method avoids hardcoding a long-lived AccessKey in your script and provides secure access to cloud resources. For more information, see Associate a role to securely access cloud resources.

Note

ossutil is pre-installed in the DataWorks environment and does not require manual installation. The default path is /home/admin/usertools/tools/ossutil64.

Access OSS or NAS using datasets

In DataWorks, you can create datasets for OSS or NAS. You can then use these datasets in a shell node to read from or write to OSS or NAS storage.

Run a node with a RAM role

You can run node tasks with a specific RAM role by associating a role to securely access cloud resources. This method allows for fine-grained permission control and enhanced security.

Appendix: Script exit codes

A script's exit code indicates whether it ran successfully.

  • Exit code 0: Indicates success.

  • Exit code -1: The process was terminated.

  • Exit code 2: The platform automatically reruns the task once.

  • Other exit codes: Indicate failure.

The following log shows a successful execution, indicated by an exit code of 0.

INFO  Exit code of the Shell command 0
INFO  --- Invocation of Shell command completed ---
INFO  Shell run successfully!
Note

Due to the underlying shell mechanism, a shell node's exit code is the exit code of the last command executed in the script.