If you use other scheduling systems and want to trigger DataWorks nodes after nodes in other scheduling systems are run, you can use HTTP trigger nodes of DataWorks to achieve this goal. This topic describes the process and precautions of using an HTTP trigger node of DataWorks when an external scheduling system is used for node triggering.
Prerequisites
- DataWorks Enterprise Edition or a more advanced edition is activated.
- A workflow is created. The compute nodes to be triggered by an HTTP trigger node are created. In the following example, ODPS SQL nodes are used as the compute nodes. For more information about how to create an ODPS SQL node, see Create an ODPS SQL node.
Background information
An external scheduling system is used to trigger nodes in the following typical scenarios:
- An HTTP trigger node has no other ancestor nodes.
In this scenario, you must configure a trigger in the external scheduling system after you create the HTTP trigger node, and configure scheduling properties and dependencies for each node in DataWorks. For more information, see Create an HTTP trigger node and Configure triggers in other scheduling systems.
- An HTTP trigger node has an ancestor node.
In this scenario, take note of the following items:
- You must configure a trigger in the external scheduling system after you create the HTTP trigger node, and configure scheduling properties and dependencies for each node in DataWorks. For more information, see Create an HTTP trigger node and Configure triggers in other scheduling systems.
- The default ancestor node of an HTTP trigger node is the root node of the workflow. You must manually change the ancestor node of the HTTP trigger node to the required node.
- The HTTP trigger node can trigger its descendant nodes only after the ancestor node
is run and a scheduling instruction is received from the external scheduling system.
If a scheduling instruction is received from the external scheduling system before the ancestor node is completed, the HTTP trigger node does not trigger its descendant nodes. The system retains the scheduling instruction from the external scheduling system and schedules the HTTP trigger node to trigger its descendant nodes after the ancestor node is run.Notice The scheduling instruction from the external scheduling system can be retained for only 24 hours. If the ancestor node is not completed within 24 hours, the scheduling instruction from the external scheduling system becomes invalid and is discarded.
Limits
- You can use the HTTP trigger node feature only in DataWorks Enterprise Edition or a more advanced edition. The HTTP trigger node feature is available in canary release mode only in the China (Chengdu) region.
- HTTP trigger nodes serve only as nodes that trigger other compute nodes. HTTP trigger nodes themselves cannot complete compute tasks. You can configure the nodes to be triggered as the descendant nodes of an HTTP trigger node.
- After the workflow is created and run, the external scheduling system needs to resend scheduling instructions to rerun trigger nodes.
- After the workflow is created and run, retroactive data needs to be generated to obtain the historical running results of the descendant nodes of a trigger node. For more information, see Retroactive instances. The HTTP trigger node does not need to wait for scheduling instructions from the external scheduling system. Instead, the HTTP trigger node directly triggers its descendant nodes to generate retroactive data.
Create an HTTP trigger node
Configure triggers in other scheduling systems
You can use Java or Python to configure a trigger in an external scheduling system
or call an API operation to run a trigger node.
- Java
- Install the SDK for Java. For more information, see Quick start.
Specify the following POM configuration to use DataWorks SDK:
<dependency> <groupId>com.aliyun</groupId> <artifactId>aliyun-java-sdk-dataworks-public</artifactId> <version>3.1.1</version> </dependency>
- Sample code
import com.aliyuncs.DefaultAcsClient; import com.aliyuncs.IAcsClient; import com.aliyuncs.exceptions.ClientException; import com.aliyuncs.exceptions.ServerException; import com.aliyuncs.profile.DefaultProfile; import com.google.gson.Gson; import java.util.*; import com.aliyuncs.dataworks_public.model.v20200518.*; public class RunTriggerNode { public static void main(String[] args) { // Specify the region, AccessKey ID, and AccessKey secret. // cn-hangzhou is the region where the node resides. // <accessKeyId> is the AccessKey ID. // <accessSecret> is the AccessKey secret. DefaultProfile profile = DefaultProfile.getProfile("cn-hangzhou", "<accessKeyId>", "<accessSecret>"); IAcsClient client = new DefaultAcsClient(profile); RunTriggerNodeRequest request = new RunTriggerNodeRequest(); // Specify the ID of the trigger node. You can call the ListNodes operation to query node IDs. request.setNodeId(700003742092L); // Specify the ID of the trigger node. You can call the ListNodes operation to query node IDs. request.setCycleTime(1605629820000L); // Specify the data timestamp of the trigger node instance. Use the data timestamp format. // The data timestamp is one day before the scheduled time and is accurate to the day. The hour, minute, and second are presented in the format of 00000000. For example, the node is scheduled to run on November 25, 2020. In this case, you must convert the date-based time to the data timestamp 2020112400000000. request.setBizDate(1605542400000L); // Specify the ID of the DataWorks workspace to which the trigger node belongs. You can call the ListProjects operation to query workspace IDs. request.setAppId(123L); try { RunTriggerNodeResponse response = client.getAcsResponse(request); System.out.println(new Gson().toJson(response)); } catch (ServerException e) { e.printStackTrace(); } catch (ClientException e) { System.out.println("ErrCode:" + e.getErrCode()); System.out.println("ErrMsg:" + e.getErrMsg()); System.out.println("RequestId:" + e.getRequestId()); } } }
- Install the SDK for Java. For more information, see Quick start.
- Python
- Install the SDK for Python. For more information, see Quick start.
Run the following command to install DataWorks SDK:
pip install aliyun-python-sdk-dataworks-public==2.1.2
- Sample code
#! /usr/bin/env python #coding=utf-8 from aliyunsdkcore.client import AcsClient from aliyunsdkcore.acs_exception.exceptions import ClientException from aliyunsdkcore.acs_exception.exceptions import ServerException from aliyunsdkdataworks_public.request.v20200518.RunTriggerNodeRequest import RunTriggerNodeRequest # Specify the region, AccessKey ID, and AccessKey secret. # cn-hangzhou is the region where the node resides. # <accessKeyId> is the AccessKey ID. # <accessSecret> is the AccessKey secret. client = AcsClient('<accessKeyId>', '<accessSecret>', 'cn-hangzhou') request = RunTriggerNodeRequest() request.set_accept_format('json') # Specify the ID of the trigger node. You can call the ListNodes operation to query node IDs. request.set_NodeId(123) # Specify the timestamp for running the trigger node. Convert the scheduled time to run the HTTP trigger node to a timestamp. request.set_CycleTime(1606321620000) # Specify the data timestamp of the trigger node instance. Use the data timestamp format. # The data timestamp is one day before the scheduled time and is accurate to the day. The hour, minute, and second are presented in the format of 00000000. For example, the node is scheduled to run on November 25, 2020. In this case, you must convert the date-based time to the data timestamp 2020112400000000. request.set_BizDate(1606233600000) # Specify the ID of the DataWorks workspace to which the trigger node belongs. You can call the ListProjects operation to query workspace IDs. request.set_AppId(11456) response = client.do_action_with_exception(request) # python2: print(response) print(str(response, encoding='utf-8'))
- Install the SDK for Python. For more information, see Quick start.
- API operation
For more information about the API operation, see RunTriggerNode.