If you use an external scheduling system and want to trigger DataWorks nodes after nodes in the scheduling system are run, you can use an HTTP Trigger node of DataWorks to trigger the DataWorks nodes. This topic describes how to use an HTTP Trigger node of DataWorks when an external scheduling system is used to trigger DataWorks nodes. This topic also describes the precautions of using an HTTP Trigger node.

Prerequisites

  • DataWorks of the Enterprise Edition or a more advanced edition is activated.
  • A workflow is created. The compute nodes that need to be triggered by an HTTP Trigger node are created. In the example of this topic, ODPS SQL nodes are used as the compute nodes. For more information about how to create an ODPS SQL node, see Create an ODPS SQL node.

Background information

An external scheduling system is used to trigger nodes in the following typical scenarios:
  • An HTTP Trigger node has no ancestor nodes other than the root node of the workflow.
    No ancestor nodesIn this scenario, you must configure a trigger in the external scheduling system after you create the HTTP Trigger node. Then, you must configure properties for each node in DataWorks. For more information, see Create an HTTP Trigger node and Configure triggers in an external scheduling system.
  • An HTTP Trigger node has an ancestor node.
    Ancestor node existsIn this scenario, take note of the following items:
    • You must configure a trigger in the external scheduling system after you create the HTTP Trigger node. Then, you must configure properties for each node in DataWorks. For more information, see Create an HTTP Trigger node and Configure triggers in an external scheduling system.
    • By default, the HTTP Trigger node uses the root node of the workflow as its ancestor node. You must manually change the ancestor node of the HTTP Trigger node to the desired node.
    • The HTTP Trigger node can trigger its descendant nodes only after the ancestor node of the HTTP Trigger node is successfully run and the HTTP Trigger node receives a scheduling instruction from the external scheduling system.
      If the HTTP Trigger node receives a scheduling instruction from the external scheduling system before the ancestor node of the HTTP Trigger node is successfully run, the HTTP Trigger node does not trigger its descendant nodes. The DataWorks scheduling system retains the scheduling instruction from the external scheduling system and schedules the HTTP Trigger node to trigger the descendant nodes after the ancestor node is successfully run.
      Notice The scheduling instruction from the external scheduling system can be retained only for 24 hours. If the running of the ancestor node is not complete within 24 hours, the scheduling instruction from the external scheduling system becomes invalid and is discarded.

Limits

  • Only DataWorks of the Standard Edition or a more advanced edition support HTTP Trigger nodes. HTTP Trigger nodes are available in the regions within China and the Singapore (Singapore) and Germany (Frankfurt) regions. The regions within China do not include the regions where Alibaba Finance Cloud and Alibaba Gov Cloud are deployed.
  • HTTP Trigger nodes serve only as nodes that trigger other compute nodes. HTTP Trigger nodes cannot run compute tasks. You must configure the nodes that need to be triggered as the descendant nodes of an HTTP Trigger node.
  • If you want to rerun an HTTP Trigger node after a workflow is created and run, you must enable the external scheduling system to send a scheduling instruction again.
  • If you want to obtain the historical results of the descendant nodes of an HTTP Trigger node after a workflow is created and run, you must backfill data for the descendant nodes. For more information, see Retroactive instances. The HTTP Trigger node does not wait for scheduling instructions on data backfill from the external scheduling system. Instead, the HTTP Trigger node directly triggers its descendant nodes to backfill data.

Create an HTTP Trigger node

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides, find the workspace, and then click Data Analytics in the Actions column.
  2. On the DataStudio page, move the pointer over the Create icon and choose General > HTTP Trigger.
    Alternatively, find the desired workflow, click the workflow name, right-click General, and then choose Create > HTTP Trigger.
  3. In the Create Node dialog box, set the Node Name and Location parameters.
    Note The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
  4. Click Commit.
  5. On the configuration tab of the node, click Properties in the right-side navigation pane. On the Properties tab, configure properties for the node. For more information, see Basic properties.
    Note By default, the HTTP Trigger node uses the root node of the workflow as its ancestor node. You must manually change the ancestor node of the HTTP Trigger node to the desired node.
  6. Save and commit the node.
    Notice You must set the Rerun and Parent Nodes parameters before you can commit the node.
    1. Click the Save icon in the toolbar to save the node.
    2. Click the Commit icon in the toolbar.
    3. In the Commit Node dialog box, enter your comments in the Change description field.
    4. Click OK.
    In a workspace in standard mode, you must click Deploy in the upper-right corner after you commit the node. For more information, see Deploy nodes.
  7. Test the node. For more information, see View auto triggered nodes.

Configure triggers in an external scheduling system

You can use Java or Python to configure a trigger in an external scheduling system or call an API operation to run an HTTP Trigger node.
  • Java
    1. Install Alibaba Cloud SDK for Java. For more information, see Quick start.
      Specify the following Project Object Model (POM) configurations to use DataWorks SDK for Java:
      <dependency>
       <groupId>com.aliyun</groupId>
       <artifactId>aliyun-java-sdk-dataworks-public</artifactId>
       <version>3.1.1</version>
      </dependency>
    2. Use the following sample code and specify the parameters in the code.
      import com.aliyuncs.DefaultAcsClient;
      import com.aliyuncs.IAcsClient;
      import com.aliyuncs.exceptions.ClientException;
      import com.aliyuncs.exceptions.ServerException;
      import com.aliyuncs.profile.DefaultProfile;
      import com.google.gson.Gson;
      import java.util.*;
      import com.aliyuncs.dataworks_public.model.v20200518.*;
      
      public class RunTriggerNode {
      
      public static void main(String[] args) {
      
      // Specify the region ID, AccessKey ID, and AccessKey secret. 
      // cn-hangzhou indicates the ID of the region where the node that needs to be triggered by the HTTP Trigger node resides. 
      // <accessKeyId> indicates the AccessKey ID. 
      // <accessSecret> indicates the AccessKey secret.
      DefaultProfile profile = DefaultProfile.getProfile("cn-hangzhou", "<accessKeyId>", "<accessSecret>");
      
      IAcsClient client = new DefaultAcsClient(profile);
      
      RunTriggerNodeRequest request = new RunTriggerNodeRequest();
      
      // Specify the ID of the HTTP Trigger node. You can call the ListNodes operation to query the ID.
      request.setNodeId(700003742092L);
      
      
      // Specify the timestamp for running the HTTP Trigger node. Convert the scheduled time to run the HTTP Trigger node to a timestamp.
      // If the region where the HTTP Trigger node resides and the region where the scheduling system resides are in different time zones, specify the timestamp of the time zone where the HTTP Trigger node resides. 
      // For example, the HTTP Trigger node resides in the China (Beijing) region, the node is scheduled to run at 18:00:00 (UTC+8), and the scheduling system resides in the US (Silicon Valley) region. In this case, specify the timestamp that corresponds to 18:00:00 (UTC+8). 
      request.setCycleTime(1605629820000L);
      
      
      // Specify the data timestamp of the HTTP Trigger node instance. 
      // The data timestamp is one day before the time at which the HTTP Trigger node is scheduled to run and is accurate to the day. The hour, minute, and second are presented as 00000000. For example, the HTTP Trigger node is scheduled to run on November 25, 2020. In this case, you must convert the date-based time to the data timestamp 2020112400000000.
      // If the region where the HTTP Trigger node resides and the region where the scheduling system resides are in different time zones, specify the timestamp of the time zone where the HTTP Trigger node resides. 
      request.setBizDate(1605542400000L);
      
      // Specify the ID of the DataWorks workspace to which the HTTP Trigger node belongs. You can call the ListProjects operation to query the ID.
      request.setAppId(123L);
      
      try {
      
      RunTriggerNodeResponse response = client.getAcsResponse(request);
      System.out.println(new Gson().toJson(response));
      } catch (ServerException e) {
      e.printStackTrace();
      } catch (ClientException e) {
      System.out.println("ErrCode:" + e.getErrCode());
      System.out.println("ErrMsg:" + e.getErrMsg());
      System.out.println("RequestId:" + e.getRequestId());
      }
      
      }
      
      }
  • Python
    1. Install Alibaba Cloud SDK for Python. For more information, see Quick start.
      Run the following command to install DataWorks SDK for Python:
      pip install aliyun-python-sdk-dataworks-public==2.1.2
    2. Use the following sample code and specify the parameters in the code.
      #!/usr/bin/env python
      #coding=utf-8
      
      from aliyunsdkcore.client import AcsClient
      from aliyunsdkcore.acs_exception.exceptions import ClientException
      from aliyunsdkcore.acs_exception.exceptions import ServerException
      from aliyunsdkdataworks_public.request.v20200518.RunTriggerNodeRequest import RunTriggerNodeRequest
      
      # Specify the region ID, AccessKey ID, and AccessKey secret.
      # cn-hangzhou indicates the ID of the region where the node that needs to be triggered by the HTTP Trigger node resides.
      # <accessKeyId> indicates the AccessKey ID.
      # <accessSecret> indicates the AccessKey secret.
      client = AcsClient('<accessKeyId>', '<accessSecret>', 'cn-hangzhou')
      
      request = RunTriggerNodeRequest()
      request.set_accept_format('json')
      # Specify the ID of the HTTP Trigger node. You can call the ListNodes operation to query the ID.
      request.set_NodeId(123)
      
      # Specify the timestamp for running the HTTP Trigger node. Convert the scheduled time to run the HTTP Trigger node to a timestamp.
      # If the region where the HTTP Trigger node resides and the region where the scheduling system resides are in different time zones, specify the timestamp of the time zone where the HTTP Trigger node resides. 
      # For example, the HTTP Trigger node resides in the China (Beijing) region, the node is scheduled to run at 18:00:00 (UTC+8), and the scheduling system resides in the US (Silicon Valley) region. In this case, specify the timestamp that corresponds to 18:00:00 (UTC+8). 
      request.set_CycleTime(1606321620000)
      
      # Specify the data timestamp of the HTTP Trigger node instance. 
      # The data timestamp is one day before the time at which the HTTP Trigger node is scheduled to run and is accurate to the day. The hour, minute, and second are presented as 00000000. For example, the HTTP Trigger node is scheduled to run on November 25, 2020. In this case, you must convert the date-based time to the data timestamp 2020112400000000.
      # If the region where the HTTP Trigger node resides and the region where the scheduling system resides are in different time zones, specify the timestamp of the time zone where the HTTP Trigger node resides. 
      request.set_BizDate(1606233600000)
      
      # Specify the ID of the DataWorks workspace to which the HTTP Trigger node belongs. You can call the ListProjects operation to query the ID.
      request.set_AppId(11456)
      
      
      response = client.do_action_with_exception(request)
      # python2: print(response)
      print(str(response, encoding='utf-8'))
  • API operation

    For more information about the API operation, see RunTriggerNode.