A script node allows the script to be run periodically or at the specified point in time. This topic describes how to configure a script node by using the task orchestration feature of Data Management (DMS) together with Database Gateway.

Background information

The scripts for the following tasks are stored on servers. You can use the task orchestration feature of DMS together with Database Gateway to schedule the scripts in a unified manner. Examples:
  • Process data by using advanced tools, including the NumPy and scikit-learn libraries for Python, and the MLlib library of Apache Spark. The models generated after data processing can be applied to the fine sort and recommendation features of a search system.
  • Consume data. For example, you can generate an Excel script when you read data or generate a script that is used to automatically send emails that contain the data you read.

Limits

Only shell scripts are supported.

Note
  • You can use shell scripts to schedule programs that support scripts in other programming languages.
  • A script name can contain only letters, digits, underscores (_), and periods (.).

Step 1: Create a database gateway

Create a database gateway on the server where the shell script resides and move the shell script you want to run to the dg_scripts directory.
Note One database gateway corresponds to one server.

For example, if you need to run scripts on three Elastic Compute Service (ECS) instances, you must create three database gateways instead of creating three nodes with one database gateway.

  1. Log on to the Database Gateway console.
  2. Click Create Gateway.
  3. In the Create Gateway step, enter the name and description of the database gateway, and then click Next step.
  4. Install the database gateway on the server where the script you want to run resides.
    1. Select Linux/MacOS (x86_64). Only the Linux operating system is supported. You are not allowed to install and start the database gateway as a root user.
    2. If you install a database gateway on an ECS instance, we recommend that you select Access through Alibaba Cloud VPC internal address (ECS self-built library/leased line/CEN/VPN gateway).
    3. Click Copying the command line to paste the command to the server on which you want to install the database gateway. Press the ENTER key to execute the command.
      After the database gateway is enabled on your server, perform the next step.
  5. Create a directory named dg_scripts in the user directory of the server on which the database gateway is installed. By default, the dg_scripts directory is automatically created by the system.
    For example, the current user is xiaoming. In this case, run the mkdir dg_scripts command in the /home/xiaoming directory to create the dg_scripts directory.
  6. Move the shell script that you want to run to the dg_scripts directory.
    For example, if the script is named demo.sh. Run the mv <demo.sh> /home/xiaoming/dg_scripts command in the directory under which the script resides to move the script to the dg_scripts directory.

Step 2: Configure a script node

  1. Log on to the DMS console V5.0.
    Note To switch to the previous version of the DMS console, click the 5租户头像 icon in the lower-right corner of the page. For more information, see Switch to the previous version of the DMS console.
  2. In the top navigation bar, click DTS. In the left-side navigation pane, choose Data Development > Task Orchestration.
    Note If you are using the previous version of the DMS console, move the pointer over the More icon in the top navigation bar and choose Data Factory > Task Orchestration (New).
  3. Click the name of the task flow that you want to manage to go to the details page of the task flow.
    Note For more information about how to create a task flow, see Overview.
  4. Optional: In the Task Type list on the left side of the canvas, drag the Script node to the blank area on the canvas.
  5. Double-click the script node.
  6. Click the Variable Setting tab in the right-side navigation pane. In the panel that appears, set parameters that are required for a script node in the Script Settings section as needed. You can click the Info icon icon in the upper-right corner of the Variable Setting panel to view the tips about variable configurations.
    • Click the Node Variable tab and configure node variables. For more information, see Configure time variables.
    • Click the Task Flow Variable tab and configure task flow variables. For more information, see Configure time variables.
    • Click the Input Variables tab to view the input variables. You can view the upstream variables, states, and system variables on the Input Variables tab.
    • Click the Output Variables tab. On the tab that appears, click Increase Variable to add output variables that are to be used in the downstream node.

      In the Variable Name field, enter the name of the variable. For more information about output variables, see Overview.

  7. In the Script Settings section, set the parameters that are described in the following table as needed.
    Parameter Description
    Region The region in which the database gateway resides.
    Gateway ID The name of the database gateway.
    Note You can view the name of a database gateway on the Gateway List page in the Database Gateway console.
    Gateway ID The ID of the gateway node.
    Note You can view the ID of a gateway node on the Gateway details page of the corresponding gateway.
    File name The name of the script in the dg_scripts directory under which the database gateway is installed. For example, if the storage path of the script is /home/xiaoming/dg_scripts/demo.sh, enter demo.sh.
    Runtime Parameter Select a variable from the drop-down list or enter a variable name in the field to search for the required variable. The selected variables are runtime parameters for the script node.

    The following three types of variables are supported by the script node:

    • System variable: You can reference a system variable in the ${Variable name} format. Example: ${sys.flow.start.year}. For more information about system variables, see System variables.
    • Time variable: You can reference a time variable in the ${Variable name} format. For more information about custom time variables, see Configure time variables.
    • Output variable: Output variables are automatically obtained by the script node. For more information about output variables, see Overview.
  8. Click Try Run.
    • If status SUCCEEDED appears in the last line of the logs, the test run is successful.
    • If status FAILED appears in the last line of the logs, the test run fails.
      Note If the test run fails, view the node on which the failure occurs and the reason for the failure in the logs. Then, modify the configuration of the node and try again.
    After the test run is successful, you can view the standard output in the logs.