All Products
Search
Document Center

DataWorks:Create a Shell node

Last Updated:Aug 11, 2023

Shell nodes support the standard shell syntax but not the interactive syntax.

Limits

  • Shell nodes support the standard shell syntax but not the interactive syntax.
  • Shell nodes can be run only by using exclusive resource groups for scheduling.
  • A Shell node that is run on an exclusive resource group for scheduling may need to access a data source for which a whitelist is configured. In this case, you must add the required elastic IP address (EIP) or CIDR block to the whitelist of the data source.
  • Do not start a large number of subprocesses in a Shell node. If you start a large number of subprocesses in a Shell node that is run on an exclusive resource group for scheduling, other nodes that are run on the resource group may be affected because DataWorks does not impose a limit on the resource usage for running Shell nodes.

Create a common Shell node

  1. Go to the DataStudio page.

    1. Log on to the DataWorks console.

    2. In the left-side navigation pane, click Workspaces.

    3. In the top navigation bar, select the region where the workspace resides. On the Workspaces page, find the workspace in which you want to create tables, and click Shortcuts > Data Development in the Actions column.

  2. Move the pointer over the Create icon and choose Create Node > General > Shell.
    Alternatively, you can click the name of the desired workflow in the Business Flow section, right-click General, and then choose Create Node > Shell.

Enable a Shell node to use resources

Before a node can use a resource in DataWorks, you must upload the resource to DataWorks and reference the resource in the runtime environment of the node. This section describes the procedure.

Upload a resource

DataWorks allows you to create a resource or upload an existing resource. You can select a method based on the GUIs of each type of resource.

  1. Go to the DataStudio page and create the desired type of resource for the Shell node in the desired workflow based on your business requirements.
    Note If no workflow is available, create one. For information about how to create a workflow, see Create a workflow.
  2. Commit and deploy the resource.
    Click the Submit icon in the top toolbar to commit the resource to the development environment.
    Note If nodes in the production environment need to use this resource, you also need to deploy the resource to the production environment. For more information, see Deploy nodes.

Reference the resource in the node

To enable the node to use the resource, you must reference the resource in the node. After the resource is referenced, the @resource_reference{"Resource name"} comment is displayed in the upper part of the node code. Procedure:

  1. Open the created node.
  2. In the Scheduled Workflow pane of the DataStudio page, find the resource that you uploaded.
  3. Right-click the resource and select Insert Resource Path to reference the resource in the current node.
Note After the resource is referenced, you can find the resource reference comment that is automatically added in the upper part of the node code.

Scheduling parameters used by Shell nodes

You are not allowed to customize variable names for common Shell nodes. The variables must be named based on their ordinal numbers, such as $1, $2, and $3. If the number of parameters reaches or exceeds 10, use ${Number} to declare the excess variables. For example, use ${10} to declare the tenth variable. For information about how to configure and use scheduling parameters, see Configure and use scheduling parameters. For information about the methods to assign values to scheduling parameters, see Supported formats of scheduling parameters.

In the preceding figure, custom parameters are assigned to the custom variables $1, $2, and $3 in the Parameters section, and the custom variables are referenced in the code editor. Examples:
  • $1: Specify $bizdate as $1. This variable is used to obtain the data timestamp. $bizdate is a built-in parameter.
  • $2: Specify ${yyyymmdd} as $2. This variable is used to obtain the data timestamp.
  • $3: Specify $[yyyymmdd] as $3. This variable is used to obtain the data timestamp.
Note For common Shell nodes, you can assign custom parameters to custom variables only by using expressions. The parameters must be separated by a space, and the parameter values must match the order in which the parameters are defined. For example, the first parameter $bizdate that you enter in the Parameters section is assigned to the first variable $1.

How do I determine whether a custom Shell script is successfully run?

The exit code of the custom Shell script determines whether the script is successfully run. Exit codes:

  • 0: indicates that the custom Shell script is successfully run.
  • -1: indicates that the custom Shell script is terminated.
  • 2: indicates that the custom Shell script needs to be automatically rerun.
  • Other exit codes: indicate that the custom Shell script fails to run.
For a Shell script, if the first command is an invalid command, an error is returned. If a valid command is run after the invalid command, the Shell script can be successfully run. Example:
#! /bin/bash
curl http://xxxxx/asdasd
echo "nihao"

The Shell script is successfully run because the script exited as expected.

If you change the previous script to the following script, a different result is returned. Example:
#! /bin/bash
curl http://xxxxx/asdasd
if [[ $? == 0 ]];then  
    echo "curl success"
else  
    echo "failed"  
    exit 1
fi
echo "nihao"

In this case, the script fails to run.

Use a Shell script to access OSSUtils

You can use the following default installation path if you want to install OSSUtils:
  • /home/admin/usertools/tools/ossutil64.
  • For information about the common commands in OSSUtils, see Common commands.
You can configure the username and password that are used to access Object Storage Service (OSS) in a configuration file based on your business requirements. Then, you can use O&M Assistant to upload the configuration file to the /home/admin/usertools/tools/myconfig directory.
[Credentials]
        language = CH
        endpoint = oss.aliyuncs.com
        accessKeyID = your_accesskey_id
        accessKeySecret = your_accesskey_secret
        stsToken = your_sts_token
        outputDir = your_output_dir
        ramRoleArn = your_ram_role_arn
Command syntax:
#! /bin/bash
/home/admin/usertools/tools/ossutil64 --config-file  /home/admin/usertools/tools/myconfig  cp oss://bucket/object object
if [[ $? == 0 ]];then
    echo "access oss success"
else
    echo "failed"
    exit 1
fi
echo "finished"

Subsequent operations

If the Shell node needs to be periodically scheduled, you need to define the scheduling properties for the Shell node and deploy the node to the production environment. For information about how to configure scheduling properties for nodes, see Configure scheduling properties for nodes. For information about how to deploy nodes to the production environment, see Deploy nodes.