In DataWorks, Shell nodes allow you to use resources, configure parameters, run scripts, and access ossutil. This topic describes how to use Shell nodes.
Prerequisites
The RAM user that you want to use is added to your workspace.
If you want to use a RAM user to develop tasks, you must add the RAM user to your workspace as a member and assign the Develop or Workspace Administrator role to the RAM user. The Workspace Administrator role has more permissions than necessary. Exercise caution when you assign the Workspace Administrator role. For more information about how to add a member and assign roles to the member, see Add members to a workspace.
A serverless resource group is associated with your workspace. For more information, see the topics in the Use serverless resource groups directory.
Create a Shell node before you start development. For more information, see Create a node for a workflow.
Precautions
Shell nodes support the standard Shell syntax but not the interactive syntax.
If a Shell node runs on a serverless resource group and needs to access a destination protected by a whitelist, you must use a serverless resource group to connect to the destination application.
NoteWhen you run a task on a serverless resource group, a single task can be configured with up to
64 CU. However, we recommend that you do not configure more than16 CUto prevent a resource shortage caused by excessive CUs, which can affect task startup.If you want to use a specific development environment to develop a task, you can use the custom image feature in DataWorks to build a custom component image required for task execution. For more information, see Custom image.
Do not start too many child processes in a Shell node. DataWorks does not limit the resource usage for running Shell nodes. If you start too many child processes, other tasks that run on the same scheduling resource group may be affected.
If other scripts, such as Python scripts, are referenced in the Shell node, the Shell script ends only after the referenced script is complete.
Develop a Shell node
Develop the Shell node.
The following code provides an example. For more examples, see Shell node development examples.
echo "Hello DataWorks!"After you develop the code, click Test Configuration in the right-side pane. Select the resource group and scheduling parameters for the test run. Then, click the
Run button to test the code.After you develop and test the Shell node script, configure scheduling properties for the Shell node to allow the node to run periodically.
After you configure the scheduling properties for the task, click Save before you proceed to the next step.
Node deployment and O&M
After you configure the scheduling properties, you can commit and deploy the Shell node to the production environment. For more information, see Node or workflow deployment.
The deployed task runs periodically based on the scheduling properties that you configured. You can go to to view the deployed auto triggered task and perform O&M operations. For more information, see Getting started with Operation Center.
Shell node development examples
Use resources using a Shell node
In DataWorks, you can use Resource Management to upload resources that are required by a Shell node. For more information, see Resource Management.
NoteA resource must be committed before it can be referenced by a node. If a production task needs to use this resource, you must also publish the resource to the production environment.
Open the Shell node to go to the script editor page.
In the navigation pane on the left, click
to open the Resource Management menu. Find the resource that you want to reference. Hover over the resource, right-click it, and select the Reference Resource option to add a reference to the resource in the Shell script.
If a resource is referenced, a comment in the
##@resource_reference{"Resource name"}format is automatically added.You can use the referenced resource by its name.
Use scheduling parameters using a Shell node
You cannot customize variable names in common Shell nodes. The variables must be named $1, $2, $3, and so on in ascending order. You can configure parameter values in Scheduling Parameters. Separate multiple parameter values with spaces.

For example:
Parameter $1 is assigned the current time: $[yyyymmdd].
Parameter $2 is manually assigned a static field:
Hello DataWorks.
To obtain the output parameters of an ancestor node, you must add parameters in and set the parameter values to the output parameters of the ancestor node.
Access ossutil using a Shell node
The path to ossutil is provided because ossutil is installed by default.
/home/admin/usertools/tools/ossutil64.
For more information about common ossutil commands, see ossutil 1.0.
You can configure a configuration file that contains the username and password. For example, you can use O&M Assistant to upload the configuration file to the following directory: /home/admin/usertools/tools/myconfig.
[Credentials]
language = CH
endpoint = oss.aliyuncs.com
accessKeyID = your_accesskey_id
accessKeySecret = your_accesskey_secret
stsToken = your_sts_token
outputDir = your_output_dir
ramRoleArn = your_ram_role_arnThe command format is as follows:
#!/bin/bash
/home/admin/usertools/tools/ossutil64 --config-file /home/admin/usertools/tools/myconfig cp oss://bucket/object object
if [[ $? == 0 ]];then
echo "access oss success"
else
echo "failed"
exit 1
fi
echo "finished"Access data on OSS or NAS using a Shell node
In DataWorks, you can create datasets of the OSS or NAS type. You can then use datasets in Shell node development. This allows the Shell node to read data from and write data to OSS or NAS storage during runtime.
Run a node by associating a role
You can associate a RAM role with a node to run node tasks. This provides fine-grained permission control and security management.
Appendix: How to determine whether a custom Shell script task is successfully executed
The success of a script execution is determined by the following process exit codes:
Exit code 0: Indicates success.
Exit code -1: Indicates that the process is terminated.
Exit code 2: Indicates that the platform needs to automatically rerun the task.
Other exit codes: Indicate failure.