All Products
Search
Document Center

DataWorks:Shell node

Last Updated:Jun 30, 2026

The DataWorks shell node is for data engineers. It runs non-interactive, standard shell scripts and is ideal for automation tasks, such as Object Storage Service (OSS) file processing and tool invocation. The node integrates ossutil out of the box, allowing secure access to OSS through configuration files or command-line parameters. It also supports scheduling parameter injection, resource referencing, and runtime environment extension using custom images to meet production-level scheduling and O&M requirements.

Permissions

Add the RAM account used for node development to the target workspace and grant it the developer or workspace administrator role. For details, see Add a member to a workspace.

Data processing node types

DataWorks provides various types of data processing nodes. You can select a node based on your business scenario to perform large-scale data cleansing tasks. Your options are not limited to shell scripts.

  • batch synchronization node: Suitable for large-scale data migration and transformation, and supports batch data synchronization between different data sources.

  • MaxCompute SQL node: Suitable for SQL-based ETL on massive datasets and supports distributed computing.

  • Shell node (this topic): Suitable for calling external tools or executing custom script logic.

  • Assignment node with a for-each node: Suitable for batch processing in loops. It can iterate through a dataset and execute operations on each item.

Usage notes

  1. Syntax limitations

    • Standard shell syntax is supported. Interactive syntax is not supported.

  2. Runtime environment and network access

    • A shell node can run on a serverless resource group (recommended) or an exclusive resource group for scheduling (older versions). To purchase and use a serverless resource group, see Use serverless resource groups.

    • When a shell node runs on a serverless resource group, you must add the IP address or CIDR block of the serverless resource group to the destination's allowlist if one is configured.

    • When you use a serverless resource group, a single task supports a maximum configuration of 64CU. To prevent resource shortages that can affect task startup, we recommend not exceeding 16CU.

  3. Extend the development environment

    • If your task requires a specific development environment, use the custom image feature in DataWorks to build one that meets your needs. For more information, see Custom image.

  4. Resources and multiple script calls

    • Avoid starting a large number of subprocesses in a shell node. Because shell nodes currently have no resource limits, a large number of subprocesses may affect other tasks running on the same resource group for scheduling.

    • The task code size cannot exceed 128 KB.

Quick start

The following example walks you through the entire process of creating, debugging, configuring, and deploying a shell node that outputs "Hello DataWorks!".

Node development

  1. Log on to the DataWorks console. After you switch to the target region, click Data Development and O&M > Data Development in the left-side navigation pane. Select the target workspace from the drop-down list, and then click Go to DataStudio.

  2. Hover over the Create icon, and click Create Node > General > Shell. In the Create Node dialog, enter the node name and path.

  3. Enter standard shell code in the script editor (interactive syntax is not supported):

    echo "Hello DataWorks!"
  4. After you complete the code, click image, select the target resource group and image, and run the shell node task.

  5. After the script passes debugging, click Scheduling Settings on the right side to configure production-level scheduling policies such as scheduling time and resource properties. This allows the node to run periodically as planned. For more information about configuring node scheduling properties, see Configure schedule settings.

Deploy and manage nodes

  1. After you configure the scheduling properties, you can submit the node to the development environment, and then deploy it to the production environment for periodic scheduling.

  2. After the node is deployed, the task runs periodically based on the schedule you configured. You can click the image icon in the upper-left corner, and then select All Products > Data Development and O&M > Operation Center on the navigation page that appears to go to Operation Center. Then, in the left-side menu, navigate to Task O&M > Auto Triggered Task O&M > Periodic tasks to view the deployed scheduled tasks. For more information, see Manage scheduled tasks.

Advanced usage

Reference resources

  1. DataWorks allows you to upload resources to DataWorks through resource management for use in shell nodes. For more information, see Manage resources.

    Note

    A resource must be submitted before it can be referenced by a node. If the resource is required by a production task, you must also deploy the resource to the production environment. For more information, see Deploy nodes.

  2. In the left-side directory tree of Data Studio, locate the uploaded resource.

  3. Right-click the resource and select Insert Resource Path. The resource is then referenced in the current node. You can write code on the node editing page to run the resource.

    Note

    After a successful reference, the system automatically inserts a declaration comment in the format ##@resource_reference{resource_name} at the top of the script.

    This comment is a required identifier for DataWorks to recognize resource dependencies and automatically mount the corresponding resource to the execution environment at runtime. Do not manually modify or delete it.

Use scheduling parameters

Scheduling parameters are injected into shell nodes as positional parameters. Custom variable names are not supported. DataWorks passes the scheduling parameter values that you configure in the Scheduling Settings of a node to the shell script in order as positional parameters such as $1, $2, $3, and so on. When more than 9 parameters are used, use the brace syntax such as ${10} and ${11} to ensure correct parsing. Multiple parameter values must be separated by spaces, and the order must strictly match the positional references in the script.

In this example:

  • The built-in parameter $1 is assigned the business date: $bizdate.

  • The custom parameter $2 is assigned the business date: ${yyyymmdd}.

  • The custom parameter $3 is assigned the business date: $[yyyymmdd].

Note
  • If a parameter value contains spaces, enclose it in quotation marks. The entire content within the quotation marks is treated as a single parameter.

  • For more information about configuring and using scheduling parameters, see Configure scheduling parameters.

Use ossutil to access OSS

The DataWorks shell node natively supports the Alibaba Cloud OSS command-line tool ossutil, which allows bucket management, file upload and download, batch operations, and other tasks. You can configure access credentials to use ossutil to access OSS through a configuration file or command-line parameters.

Appendix: Script exit codes

You can use script exit codes to further determine whether a script ran successfully.

  • Exit code 0: Indicates success.

  • Exit code -1: Indicates the process was terminated.

  • Exit code 2: Indicates that the platform needs to automatically rerun the task once.

  • Other exit codes: Indicate failure.

The following is a sample runtime log when the shell node runs successfully (exit code 0).

INFO  Exit code of the Shell command 0
INFO  --- Invocation of Shell command completed ---
INFO  Shell run successfully!
Note

Due to the underlying shell mechanism, the exit code of the entire script in a shell node equals the exit code of the last executed command.

Reference

To learn how to use Python 2 or Python 3 commands in a shell node to run Python scripts, see Run Python scripts in a shell node.