Run Shell Nodes via Serverless Resource Groups - DataWorks

Limitations

Limitation	Details
Syntax	Supports standard shell syntax only; interactive syntax is not supported
Code size	Node code cannot exceed 128 KB
Subprocesses	Avoid starting a large number of subprocesses. On exclusive resource groups for scheduling, DataWorks does not limit resource usage for Shell nodes, so excessive subprocesses can affect other nodes on the same group
Resource groups	Can run on serverless resource groups or old-version exclusive resource groups for scheduling. Run tasks on serverless resource groups for better resource management
CU limit on serverless	A single task supports up to 64 compute units (CUs). Keep the setting within 16 CUs to avoid resource shortages that delay task startup
Whitelist access	If a Shell node on a serverless resource group needs to access a data source with a whitelist, add the required elastic IP address (EIP) or CIDR block to that data source's whitelist. See Use serverless resource groups

Note To use a specific development environment for Shell node tasks, create a custom image in the DataWorks console. See Custom images.

Prerequisites

Before you begin, ensure that you have:

A workflow. DataStudio development operations are based on workflows. See Create a workflow

Create a common Shell node

Go to the DataStudio page. Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and O\&M > Data Development. Select the desired workspace from the drop-down list and click Go to Data Development.
Move the pointer over the icon and choose Create Node > General > Shell. In the Create Node dialog box, configure the Name and Path parameters.
Click Confirm.

Use resources in a Shell node

Shell nodes can reference uploaded resource files and run them as part of the task.

Upload a resource

Upload the resource to DataWorks before referencing it. DataWorks supports creating and uploading MaxCompute and E-MapReduce (EMR) resources:

Note Commit the resource before referencing it in a node. If the production environment also needs the resource, deploy it to production. See Publish tasks.

Reference the resource in the node

After uploading, reference the resource in the Shell node so the task can use it. After referencing, the @resource_reference{"Resource name"} comment appears at the top of the node code.

Open the Shell node to go to the node editing page.
In the Scheduled Workflow pane, find the resource you uploaded.
Right-click the resource and select Insert Resource Path.

Write code in the editor to run the referenced resource.

Configure scheduling parameters

Shell nodes use positional variables: $1, $2, $3, and so on. For the tenth parameter or higher, use the ${Number} format — for example, ${10}.

Variable format	Usage
`$1`, `$2`, `$3` …	Positional variables for the first through ninth parameters
`${10}`, `${11}` …	Positional variables for the tenth parameter and higher

In the Parameters section, assign values to variables in order using expressions, separated by spaces. The first value goes to $1, the second to $2, and so on.

The example above assigns three parameters to $1, $2, and $3, all used to obtain a data timestamp:

$1: assigned $bizdate, a built-in parameter that returns the business date.
$2: assigned ${yyyymmdd}, a date format expression.
$3: assigned $[yyyymmdd], another date format expression.

For details on configuring and using scheduling parameters, see Configure and use scheduling parameters and Supported formats for scheduling parameters.

Determine whether a Shell script succeeded

DataWorks uses the exit code of the shell script to determine the task outcome.

Exit code	Outcome
`0`	Script ran successfully
`-1`	Script was terminated
`2`	Script needs to be automatically rerun
Any other code	Script failed

For a Shell script, if the first command is an invalid command, an error is returned. If a valid command is run after the invalid command, the Shell script can be successfully run.

Example: script succeeds despite an invalid first command

#!/bin/bash
curl http://xxxxx/asdasd   # Invalid command — fails silently
echo "nihao"               # Valid command — exits with code 0

This script exits successfully because the script exited as expected.

Example: script fails due to explicit exit

#!/bin/bash
curl http://xxxxx/asdasd
if [[ $? == 0 ]]; then
    echo "curl success"
else
    echo "failed"
    exit 1                 # Explicit exit with code 1 — task fails
fi
echo "nihao"

This script fails because exit 1 runs before the final echo.

Access OSS with ossutil

Use this pattern to read from and write to Object Storage Service (OSS) buckets directly from a Shell node task.

Default installation path: /home/admin/usertools/tools/ossutil64

To authenticate with OSS, create a configuration file and upload it using O\&M Assistant to /home/admin/usertools/tools/myconfig. The configuration file format:

[Credentials]
        language = CH
        endpoint = oss.aliyuncs.com
        accessKeyID = <your-accesskey-id>
        accessKeySecret = <your-accesskey-secret>
        stsToken = <your-sts-token>
        outputDir = <your-output-dir>
        ramRoleArn = <your-ram-role-arn>

Replace the placeholders with your actual values:

Placeholder	Description
`<your-accesskey-id>`	Your AccessKey ID
`<your-accesskey-secret>`	Your AccessKey secret
`<your-sts-token>`	Security Token Service (STS) token (required only when using temporary credentials)
`<your-output-dir>`	Local directory for ossutil output
`<your-ram-role-arn>`	Resource Access Management (RAM) role ARN (required only when using role-based access)

Example: copy an object from OSS

#!/bin/bash
/home/admin/usertools/tools/ossutil64 --config-file /home/admin/usertools/tools/myconfig \
    cp oss://bucket/object object
if [[ $? == 0 ]]; then
    echo "access oss success"
else
    echo "failed"
    exit 1
fi
echo "finished"

For a full list of ossutil commands, see Common commands.

What's next

To run the Shell node on a schedule, configure its scheduling properties and deploy it to the production environment. See Step 6: Configure scheduling properties and Publish tasks.

DataWorks:Shell node