If you use other scheduling systems and want to trigger DataWorks nodes after nodes in other scheduling systems are run, you can use HTTP trigger nodes of DataWorks to achieve this goal. This topic describes the process and precautions of using an HTTP trigger node of DataWorks when an external scheduling system is used for node triggering.

Prerequisites

  • DataWorks Enterprise Edition or a more advanced edition is activated.
  • A workflow is created. The compute nodes to be triggered by an HTTP trigger node are created. In the following example, ODPS SQL nodes are used as the compute nodes. For more information about how to create an ODPS SQL node, see Create an ODPS SQL node.

Background information

An external scheduling system is used to trigger nodes in the following typical scenarios:
  • An HTTP trigger node has no other ancestor nodes.
    No ancestor node existsIn this scenario, you must configure a trigger in the external scheduling system after you create the HTTP trigger node, and configure scheduling properties and dependencies for each node in DataWorks. For more information, see Create an HTTP trigger node and Configure triggers in other scheduling systems.
  • An HTTP trigger node has an ancestor node.
    An ancestor node existsIn this scenario, take note of the following items:
    • You must configure a trigger in the external scheduling system after you create the HTTP trigger node, and configure scheduling properties and dependencies for each node in DataWorks. For more information, see Create an HTTP trigger node and Configure triggers in other scheduling systems.
    • The default ancestor node of an HTTP trigger node is the root node of the workflow. You must manually change the ancestor node of the HTTP trigger node to the required node.
    • The HTTP trigger node can trigger its descendant nodes only after the ancestor node is run and a scheduling instruction is received from the external scheduling system.
      If a scheduling instruction is received from the external scheduling system before the ancestor node is completed, the HTTP trigger node does not trigger its descendant nodes. The system retains the scheduling instruction from the external scheduling system and schedules the HTTP trigger node to trigger its descendant nodes after the ancestor node is run.
      Notice The scheduling instruction from the external scheduling system can be retained for only 24 hours. If the ancestor node is not completed within 24 hours, the scheduling instruction from the external scheduling system becomes invalid and is discarded.

Limits

  • You can use the HTTP trigger node feature only in DataWorks Enterprise Edition or a more advanced edition. The HTTP trigger node feature is available in canary release mode only in the China (Chengdu) region.
  • HTTP trigger nodes serve only as nodes that trigger other compute nodes. HTTP trigger nodes themselves cannot complete compute tasks. You can configure the nodes to be triggered as the descendant nodes of an HTTP trigger node.
  • After the workflow is created and run, the external scheduling system needs to resend scheduling instructions to rerun trigger nodes.
  • After the workflow is created and run, retroactive data needs to be generated to obtain the historical running results of the descendant nodes of a trigger node. For more information, see Retroactive instances. The HTTP trigger node does not need to wait for scheduling instructions from the external scheduling system. Instead, the HTTP trigger node directly triggers its descendant nodes to generate retroactive data.

Create an HTTP trigger node

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides, find the workspace, and then click Data Analytics in the Actions column.
  2. On the DataStudio page, move the pointer over the Create icon and choose General > HTTP Trigger.
    Alternatively, find the required workflow, right-click General and choose Create > HTTP Trigger.
  3. In the Create Node dialog box, set the Node Name and Location parameters.
    Note The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
  4. Click Commit.
  5. On the node configuration tab, click the Properties tab in the right-side navigation pane and configure the scheduling properties for the node. For more information, see Basic properties.
    Note The default ancestor node of an HTTP trigger node is the root node of the workflow. You must manually change the ancestor node of the HTTP trigger node to the required node.
  6. Save and commit the node.
    Notice You must set the Rerun and Parent Nodes parameters before you can commit the node.
    1. Click the Save icon in the toolbar to save the node.
    2. Click the Commit icon in the toolbar.
    3. In the Commit Node dialog box, enter your comments in the Change description field.
    4. Click OK.
    In a workspace in standard mode, you must click Deploy in the upper-right corner after you commit the node. For more information, see Deploy nodes.
  7. Test the node. For more information, see View auto triggered nodes.

Configure triggers in other scheduling systems

You can use Java or Python to configure a trigger in an external scheduling system or call an API operation to run a trigger node.
  • Java
    1. Install the SDK for Java. For more information, see Quick start.
      Specify the following POM configuration to use DataWorks SDK:
      <dependency>
       <groupId>com.aliyun</groupId>
       <artifactId>aliyun-java-sdk-dataworks-public</artifactId>
       <version>3.1.1</version>
      </dependency>
    2. Sample code
      import com.aliyuncs.DefaultAcsClient;
      import com.aliyuncs.IAcsClient;
      import com.aliyuncs.exceptions.ClientException;
      import com.aliyuncs.exceptions.ServerException;
      import com.aliyuncs.profile.DefaultProfile;
      import com.google.gson.Gson;
      import java.util.*;
      import com.aliyuncs.dataworks_public.model.v20200518.*;
      
      public class RunTriggerNode {
      
      public static void main(String[] args) {
      
      // Specify the region, AccessKey ID, and AccessKey secret.
      // cn-hangzhou is the region where the node resides.
      // <accessKeyId> is the AccessKey ID.
      // <accessSecret> is the AccessKey secret.
      DefaultProfile profile = DefaultProfile.getProfile("cn-hangzhou", "<accessKeyId>", "<accessSecret>");
      
      IAcsClient client = new DefaultAcsClient(profile);
      
      RunTriggerNodeRequest request = new RunTriggerNodeRequest();
      
      // Specify the ID of the trigger node. You can call the ListNodes operation to query node IDs.
      request.setNodeId(700003742092L);
      
      
      // Specify the ID of the trigger node. You can call the ListNodes operation to query node IDs.
      request.setCycleTime(1605629820000L);
      
      
      // Specify the data timestamp of the trigger node instance. Use the data timestamp format.
      // The data timestamp is one day before the scheduled time and is accurate to the day. The hour, minute, and second are presented in the format of 00000000. For example, the node is scheduled to run on November 25, 2020. In this case, you must convert the date-based time to the data timestamp 2020112400000000.
      request.setBizDate(1605542400000L);
      
      // Specify the ID of the DataWorks workspace to which the trigger node belongs. You can call the ListProjects operation to query workspace IDs.
      request.setAppId(123L);
      
      try {
      
      RunTriggerNodeResponse response = client.getAcsResponse(request);
      System.out.println(new Gson().toJson(response));
      } catch (ServerException e) {
      e.printStackTrace();
      } catch (ClientException e) {
      System.out.println("ErrCode:" + e.getErrCode());
      System.out.println("ErrMsg:" + e.getErrMsg());
      System.out.println("RequestId:" + e.getRequestId());
      }
      
      }
      
      }
  • Python
    1. Install the SDK for Python. For more information, see Quick start.
      Run the following command to install DataWorks SDK:
      pip install aliyun-python-sdk-dataworks-public==2.1.2
    2. Sample code
      #! /usr/bin/env python
      #coding=utf-8
      
      from aliyunsdkcore.client import AcsClient
      from aliyunsdkcore.acs_exception.exceptions import ClientException
      from aliyunsdkcore.acs_exception.exceptions import ServerException
      from aliyunsdkdataworks_public.request.v20200518.RunTriggerNodeRequest import RunTriggerNodeRequest
      
      # Specify the region, AccessKey ID, and AccessKey secret.
      # cn-hangzhou is the region where the node resides.
      # <accessKeyId> is the AccessKey ID.
      # <accessSecret> is the AccessKey secret.
      client = AcsClient('<accessKeyId>', '<accessSecret>', 'cn-hangzhou')
      
      request = RunTriggerNodeRequest()
      request.set_accept_format('json')
      # Specify the ID of the trigger node. You can call the ListNodes operation to query node IDs.
      request.set_NodeId(123)
      
      # Specify the timestamp for running the trigger node. Convert the scheduled time to run the HTTP trigger node to a timestamp.
      request.set_CycleTime(1606321620000)
      
      # Specify the data timestamp of the trigger node instance. Use the data timestamp format.
      # The data timestamp is one day before the scheduled time and is accurate to the day. The hour, minute, and second are presented in the format of 00000000. For example, the node is scheduled to run on November 25, 2020. In this case, you must convert the date-based time to the data timestamp 2020112400000000.
      request.set_BizDate(1606233600000)
      
      # Specify the ID of the DataWorks workspace to which the trigger node belongs. You can call the ListProjects operation to query workspace IDs.
      request.set_AppId(11456)
      
      
      response = client.do_action_with_exception(request)
      # python2: print(response)
      print(str(response, encoding='utf-8'))
  • API operation

    For more information about the API operation, see RunTriggerNode.