All Products
Search
Document Center

DataWorks:Shell node

Last Updated:Jun 24, 2026

The DataWorks shell node is for data engineers. It runs non-interactive, standard shell scripts and is ideal for automation tasks, such as Object Storage Service (OSS) file processing and tool invocation. The node integrates ossutil out of the box, allowing secure access to OSS through configuration files or command-line parameters. It also supports scheduling parameter injection, resource referencing, and runtime environment extension using custom images to meet production-level scheduling and O&M requirements.

Permissions

Add the RAM account used for node development to the target workspace and grant it the developer or workspace administrator role. For details, see Add a member to a workspace.

Data processing node types

DataWorks provides various types of data processing nodes. You can select a node based on your business scenario to perform large-scale data cleansing tasks. Your options are not limited to shell scripts.

  • batch synchronization node: Suitable for large-scale data migration and transformation, and supports batch data synchronization between different data sources.

  • MaxCompute SQL node: Suitable for SQL-based ETL on massive datasets and supports distributed computing.

  • Shell node (this topic): Suitable for calling external tools or executing custom script logic.

  • Assignment node with a for-each node: Suitable for batch processing in loops. It can iterate through a dataset and execute operations on each item.

Usage

  1. Syntax limitations

    • Standard shell syntax is supported. Interactive syntax is not supported.

  2. Runtime environment and network access

    • A shell node can run on a serverless resource group (recommended) or an exclusive resource group for scheduling (older versions). To purchase and use a serverless resource group, see Use serverless resource groups.

    • When a shell node runs on a serverless resource group, you must add the IP address or CIDR block of the serverless resource group to the destination's allowlist if one is configured.

    • When you use a serverless resource group, a single task supports a maximum configuration of 64CU. To prevent resource shortages that can affect task startup, we recommend not exceeding 16CU.

  3. Extend the development environment

    • If your task requires a specific development environment, use the custom image feature in DataWorks to build one that meets your needs. For more information, see Custom image.

  4. Resources and multiple script calls

    • Avoid starting many child processes in a shell node. Because shell nodes have no resource limits, numerous child processes can affect other tasks running on the same exclusive resource group for scheduling.

    • The task code size cannot exceed 128 KB.

Quick start

This section uses an example that outputs "Hello DataWorks!" to walk you through the process of creating, debugging, configuring, and deploying a shell node.

Node development

  1. Log on to the DataWorks console. After you switch to the target region, click Data Development and O&M > Data Development in the left-side navigation pane, select the target workspace from the drop-down list, and click Go to DataStudio.

  2. Move the pointer over the 新建 icon and choose Create Node > General > Shell. In the Create Node dialog box, enter a name and path for the node.

  3. In the script editor, enter the standard shell code. Interactive syntax is not supported.

    echo "Hello DataWorks!"
  4. After developing the code, click the image icon, select the target resource group and image, and run the shell node task.

  5. After the script is successfully debugged, click Scheduling Settings on the right side to configure production-level scheduling policies, such as scheduling time and resource properties. This allows the node to run automatically and periodically. For more information about how to configure node scheduling properties, see Configure scheduling properties.

Deployment and maintenance

  1. After configuring the task scheduling properties, you can commit the node to the development environment and deploy it to the production environment for periodic scheduling.

  2. After a node is deployed, it runs periodically as scheduled. Click the image icon in the upper-left corner and choose All Products > Data Development & O&M > Operation Center in the navigation pop-up window to open O&M. In the left-side navigation pane, choose Task O&M > Auto Triggered Task O&M > Periodic tasks to view the deployed periodic tasks. For a detailed feature description, see Get started with O&M.

Advanced usage

Resource reference

  1. DataWorks allows you to upload resources for a shell node through resource management. For more information, see Manage resources.

    Note

    You must commit a resource before a node can reference it. If a production task needs to use the resource, you must also deploy the resource to the production environment. For more information, see Deploy tasks.

  2. In the left-side directory tree of DataStudio, find the uploaded resource.

  3. Right-click the resource and select Insert Resource Path to reference the resource in the current node. You can then write code on the node editing page to run the resource.

    Note

    After the resource is successfully referenced, the system automatically inserts a declaration comment, such as ##@resource_reference{resource_name}, at the top of the script.

    This comment is required for DataWorks to identify resource dependencies and automatically mount the resource to the execution environment when the task runs. Do not modify or delete this comment.

Scheduling parameters

Scheduling parameters are injected as positional parameters; custom variable names are not supported. DataWorks passes values from the node's Scheduling Settings to the shell script as sequential positional parameters, such as $1, $2, and $3. When the number of parameters exceeds nine, you must use braces, such as ${10} and ${11}, to ensure correct parsing. Separate multiple parameter values with spaces. The order must match the parameter positions in the script.

In this example:

  • The built-in parameter $1 is assigned the business date: $bizdate.

  • The custom parameter $2 is assigned the business date: ${yyyymmdd}.

  • The custom parameter $3 is assigned the business date: $[yyyymmdd].

Note
  • If a parameter value contains spaces, enclose it in quotation marks. The entire content within the quotation marks is treated as a single parameter.

  • For more information about how to configure and use scheduling parameters, see Configure and use scheduling parameters.

Access OSS with ossutil

The DataWorks shell node supports the Alibaba Cloud OSS command-line tool ossutil out of the box. This tool supports tasks such as bucket management, file uploads and downloads, and batch operations. You can configure access credentials to use ossutil to access OSS through either a configuration file or command-line parameters.

Appendix: Script exit codes

You can use script exit codes to further determine whether a script ran successfully.

  • Exit code 0: Indicates success.

  • Exit code -1: Indicates the process was terminated.

  • Exit code 2: Indicates that the platform needs to automatically rerun the task once.

  • Other exit codes: Indicate failure.

The following is a sample runtime log when the shell node runs successfully (exit code 0).

INFO  Exit code of the Shell command 0
INFO  --- Invocation of Shell command completed ---
INFO  Shell run successfully!
Note

Due to the underlying shell mechanism, the exit code of the entire script in a shell node equals the exit code of the last executed command.

Related documents

To learn how to run Python scripts on a shell node using Python 2 or Python 3 commands, see Run Python scripts on a shell node.