Shell nodes run standard shell scripts as scheduled tasks in DataWorks DataStudio.
Limitations
| Limitation | Details |
|---|---|
| Syntax | Supports standard shell syntax only; interactive syntax is not supported |
| Code size | Node code cannot exceed 128 KB |
| Subprocesses | Avoid starting a large number of subprocesses. On exclusive resource groups for scheduling, DataWorks does not limit resource usage for Shell nodes, so excessive subprocesses can affect other nodes on the same group |
| Resource groups | Can run on serverless resource groups or old-version exclusive resource groups for scheduling. Run tasks on serverless resource groups for better resource management |
| CU limit on serverless | A single task supports up to 64 compute units (CUs). Keep the setting within 16 CUs to avoid resource shortages that delay task startup |
| Whitelist access | If a Shell node on a serverless resource group needs to access a data source with a whitelist, add the required elastic IP address (EIP) or CIDR block to that data source's whitelist. See Use serverless resource groups |
Prerequisites
Before you begin, ensure that you have:
-
A workflow. DataStudio development operations are based on workflows. See Create a workflow
Create a common Shell node
-
Go to the DataStudio page. Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and O\&M > Data Development. Select the desired workspace from the drop-down list and click Go to Data Development.
-
Click Confirm.
Use resources in a Shell node
Shell nodes can reference uploaded resource files and run them as part of the task.
Upload a resource
Upload the resource to DataWorks before referencing it. DataWorks supports creating and uploading MaxCompute and E-MapReduce (EMR) resources:
Reference the resource in the node
After uploading, reference the resource in the Shell node so the task can use it. After referencing, the @resource_reference{"Resource name"} comment appears at the top of the node code.
-
Open the Shell node to go to the node editing page.
-
In the Scheduled Workflow pane, find the resource you uploaded.
-
Right-click the resource and select Insert Resource Path.
Write code in the editor to run the referenced resource.
Configure scheduling parameters
Shell nodes use positional variables: $1, $2, $3, and so on. For the tenth parameter or higher, use the ${Number} format — for example, ${10}.
| Variable format | Usage |
|---|---|
$1, $2, $3 … |
Positional variables for the first through ninth parameters |
${10}, ${11} … |
Positional variables for the tenth parameter and higher |
In the Parameters section, assign values to variables in order using expressions, separated by spaces. The first value goes to $1, the second to $2, and so on.
The example above assigns three parameters to $1, $2, and $3, all used to obtain a data timestamp:
-
$1: assigned$bizdate, a built-in parameter that returns the business date. -
$2: assigned${yyyymmdd}, a date format expression. -
$3: assigned$[yyyymmdd], another date format expression.
For details on configuring and using scheduling parameters, see Configure and use scheduling parameters and Supported formats for scheduling parameters.
Determine whether a Shell script succeeded
DataWorks uses the exit code of the shell script to determine the task outcome.
| Exit code | Outcome |
|---|---|
0 |
Script ran successfully |
-1 |
Script was terminated |
2 |
Script needs to be automatically rerun |
| Any other code | Script failed |
For a Shell script, if the first command is an invalid command, an error is returned. If a valid command is run after the invalid command, the Shell script can be successfully run.
Example: script succeeds despite an invalid first command
#!/bin/bash
curl http://xxxxx/asdasd # Invalid command — fails silently
echo "nihao" # Valid command — exits with code 0
This script exits successfully because the script exited as expected.
Example: script fails due to explicit exit
#!/bin/bash
curl http://xxxxx/asdasd
if [[ $? == 0 ]]; then
echo "curl success"
else
echo "failed"
exit 1 # Explicit exit with code 1 — task fails
fi
echo "nihao"
This script fails because exit 1 runs before the final echo.
Access OSS with ossutil
Use this pattern to read from and write to Object Storage Service (OSS) buckets directly from a Shell node task.
Default installation path: /home/admin/usertools/tools/ossutil64
To authenticate with OSS, create a configuration file and upload it using O\&M Assistant to /home/admin/usertools/tools/myconfig. The configuration file format:
[Credentials]
language = CH
endpoint = oss.aliyuncs.com
accessKeyID = <your-accesskey-id>
accessKeySecret = <your-accesskey-secret>
stsToken = <your-sts-token>
outputDir = <your-output-dir>
ramRoleArn = <your-ram-role-arn>
Replace the placeholders with your actual values:
| Placeholder | Description |
|---|---|
<your-accesskey-id> |
Your AccessKey ID |
<your-accesskey-secret> |
Your AccessKey secret |
<your-sts-token> |
Security Token Service (STS) token (required only when using temporary credentials) |
<your-output-dir> |
Local directory for ossutil output |
<your-ram-role-arn> |
Resource Access Management (RAM) role ARN (required only when using role-based access) |
Example: copy an object from OSS
#!/bin/bash
/home/admin/usertools/tools/ossutil64 --config-file /home/admin/usertools/tools/myconfig \
cp oss://bucket/object object
if [[ $? == 0 ]]; then
echo "access oss success"
else
echo "failed"
exit 1
fi
echo "finished"
For a full list of ossutil commands, see Common commands.
What's next
To run the Shell node on a schedule, configure its scheduling properties and deploy it to the production environment. See Step 6: Configure scheduling properties and Publish tasks.