The DataWorks shell node is for data engineers. It runs non-interactive, standard shell scripts and is ideal for automation tasks, such as Object Storage Service (OSS) file processing and tool invocation. The node integrates ossutil out of the box, allowing secure access to OSS through configuration files or command-line parameters. It also supports scheduling parameter injection, resource referencing, and runtime environment extension using custom images to meet production-level scheduling and O&M requirements.
Permissions
Add the RAM account used for node development to the target workspace and grant it the developer or workspace administrator role. For details, see Add a member to a workspace.
Data processing node types
DataWorks provides various types of data processing nodes. You can select a node based on your business scenario to perform large-scale data cleansing tasks. Your options are not limited to shell scripts.
-
batch synchronization node: Suitable for large-scale data migration and transformation, and supports batch data synchronization between different data sources.
-
MaxCompute SQL node: Suitable for SQL-based ETL on massive datasets and supports distributed computing.
-
Shell node (this topic): Suitable for calling external tools or executing custom script logic.
-
Assignment node with a for-each node: Suitable for batch processing in loops. It can iterate through a dataset and execute operations on each item.
Usage
-
Syntax limitations
-
Standard shell syntax is supported. Interactive syntax is not supported.
-
-
Runtime environment and network access
-
A shell node can run on a serverless resource group (recommended) or an exclusive resource group for scheduling (older versions). To purchase and use a serverless resource group, see Use serverless resource groups.
-
When a shell node runs on a serverless resource group, you must add the IP address or CIDR block of the serverless resource group to the destination's allowlist if one is configured.
-
When you use a serverless resource group, a single task supports a maximum configuration of
64CU. To prevent resource shortages that can affect task startup, we recommend not exceeding16CU.
-
-
Extend the development environment
-
If your task requires a specific development environment, use the custom image feature in DataWorks to build one that meets your needs. For more information, see Custom image.
-
-
Resources and multiple script calls
-
Avoid starting many child processes in a shell node. Because shell nodes have no resource limits, numerous child processes can affect other tasks running on the same exclusive resource group for scheduling.
-
The task code size cannot exceed 128 KB.
-
Quick start
This section uses an example that outputs "Hello DataWorks!" to walk you through the process of creating, debugging, configuring, and deploying a shell node.
Node development
-
Log on to the DataWorks console. After you switch to the target region, click in the left-side navigation pane, select the target workspace from the drop-down list, and click Go to DataStudio.
-
Move the pointer over the
icon and choose . In the Create Node dialog box, enter a name and path for the node. -
In the script editor, enter the standard shell code. Interactive syntax is not supported.
echo "Hello DataWorks!" -
After developing the code, click the
icon, select the target resource group and image, and run the shell node task. -
After the script is successfully debugged, click Scheduling Settings on the right side to configure production-level scheduling policies, such as scheduling time and resource properties. This allows the node to run automatically and periodically. For more information about how to configure node scheduling properties, see Configure scheduling properties.
Deployment and maintenance
-
After configuring the task scheduling properties, you can commit the node to the development environment and deploy it to the production environment for periodic scheduling.
-
After a node is deployed, it runs periodically as scheduled. Click the
icon in the upper-left corner and choose in the navigation pop-up window to open O&M. In the left-side navigation pane, choose to view the deployed periodic tasks. For a detailed feature description, see Get started with O&M.
Advanced usage
Resource reference
-
DataWorks allows you to upload resources for a shell node through resource management. For more information, see Manage resources.
NoteYou must commit a resource before a node can reference it. If a production task needs to use the resource, you must also deploy the resource to the production environment. For more information, see Deploy tasks.
-
In the left-side directory tree of DataStudio, find the uploaded resource.
-
Right-click the resource and select Insert Resource Path to reference the resource in the current node. You can then write code on the node editing page to run the resource.
NoteAfter the resource is successfully referenced, the system automatically inserts a declaration comment, such as
##@resource_reference{resource_name}, at the top of the script.This comment is required for DataWorks to identify resource dependencies and automatically mount the resource to the execution environment when the task runs. Do not modify or delete this comment.
Scheduling parameters
Scheduling parameters are injected as positional parameters; custom variable names are not supported. DataWorks passes values from the node's Scheduling Settings to the shell script as sequential positional parameters, such as $1, $2, and $3. When the number of parameters exceeds nine, you must use braces, such as ${10} and ${11}, to ensure correct parsing. Separate multiple parameter values with spaces. The order must match the parameter positions in the script.
In this example:
-
The built-in parameter $1 is assigned the business date: $bizdate.
-
The custom parameter $2 is assigned the business date: ${yyyymmdd}.
-
The custom parameter $3 is assigned the business date: $[yyyymmdd].
-
If a parameter value contains spaces, enclose it in quotation marks. The entire content within the quotation marks is treated as a single parameter.
-
For more information about how to configure and use scheduling parameters, see Configure and use scheduling parameters.
Access OSS with ossutil
The DataWorks shell node supports the Alibaba Cloud OSS command-line tool ossutil out of the box. This tool supports tasks such as bucket management, file uploads and downloads, and batch operations. You can configure access credentials to use ossutil to access OSS through either a configuration file or command-line parameters.
-
To access OSS using command-line parameters, see Access OSS by using command-line parameters.
-
To access OSS using a configuration file, see Access OSS by using a configuration file.
Appendix: Script exit codes
You can use script exit codes to further determine whether a script ran successfully.
-
Exit code 0: Indicates success.
-
Exit code -1: Indicates the process was terminated.
-
Exit code 2: Indicates that the platform needs to automatically rerun the task once.
-
Other exit codes: Indicate failure.
The following is a sample runtime log when the shell node runs successfully (exit code 0).
INFO Exit code of the Shell command 0
INFO --- Invocation of Shell command completed ---
INFO Shell run successfully!
Due to the underlying shell mechanism, the exit code of the entire script in a shell node equals the exit code of the last executed command.
Related documents
To learn how to run Python scripts on a shell node using Python 2 or Python 3 commands, see Run Python scripts on a shell node.
icon in the upper-left corner and choose