If you use an external scheduling system and want to trigger DataWorks nodes after nodes in the scheduling system are run, you can use an HTTP Trigger node of DataWorks to trigger the DataWorks nodes. This topic describes how to use an HTTP Trigger node of DataWorks if an external scheduling system is used to trigger DataWorks nodes. This topic also describes the precautions of using an HTTP Trigger node.
Introduction
An HTTP Trigger node is a special virtual node. You can call the RunTriggerNode operation of DataWorks to schedule an HTTP Trigger node and its descendant nodes.
HTTP Trigger nodes are commonly used to communicate between external systems and DataWorks scheduling system.
An HTTP Trigger node can be used to trigger a DataWorks task after an external scheduling system task is complete. An external scheduling system is used to trigger nodes in the following typical scenarios:
An HTTP Trigger node has no ancestor nodes other than the root node of the workflow.
You must configure a trigger in the external scheduling system after you create the HTTP Trigger node. Then, you must configure the scheduling properties for each node in DataWorks. For more information, see HTTP Trigger node.
An HTTP Trigger node has an ancestor node.
You must configure a trigger in the external scheduling system after you create the HTTP Trigger node. Then, you must configure the scheduling properties for each node in DataWorks.
By default, the HTTP Trigger node uses the root node of the workflow as its ancestor node. You must manually change the ancestor node of the HTTP Trigger node to the required node.
The HTTP Trigger node can trigger its descendant nodes only after the ancestor node of the HTTP Trigger node is run as expected and the HTTP Trigger node receives a scheduling instruction from the external scheduling system.
If the HTTP Trigger node receives a scheduling instruction from the external scheduling system before the ancestor node of the HTTP Trigger node finishes running, the HTTP Trigger node does not trigger its descendant nodes. The DataWorks scheduling system retains the scheduling instruction from the external scheduling system and schedules the HTTP Trigger node to trigger the descendant nodes after running of the ancestor node is complete.
NoteThe scheduling instruction from the external scheduling system can be retained only for 24 hours. If the execution of the ancestor node is not complete within 24 hours, the scheduling instruction from the external scheduling system becomes invalid and is discarded.
Prerequisites
The RAM user that you want to use is added to your workspace.
If you want to use a RAM user to develop tasks, you must add the RAM user to your workspace as a member and assign the Develop or Workspace Administrator role to the RAM user. The Workspace Administrator role has more permissions than necessary. Exercise caution when you assign the Workspace Administrator role. For more information about how to add a member and assign roles to the member, see Add workspace members and assign roles to them.
A serverless resource group is associated with your workspace. For more information, see the topics in the Use serverless resource groups directory.
An HTTP Trigger node is created before you develop a task on the HTTP Trigger node. For more information, see Create a task node.
Limits
Only DataWorks Enterprise Edition and a more advanced edition support HTTP Trigger nodes. For information about DataWorks editions, see Differences among DataWorks editions.
The Instance Generation Mode parameter can be set to only Next Day for HTTP Trigger nodes, and data backfill instances that are generated when data is backfilled cannot be triggered. Therefore, HTTP Trigger nodes can be triggered by the external scheduling system the next day after the HTTP Trigger nodes are deployed to the production environment.
HTTP Trigger nodes serve only as nodes that trigger other compute nodes. HTTP Trigger nodes cannot be used as compute nodes. You must configure the nodes that need to be triggered as the descendant nodes of an HTTP Trigger node.
If you want to rerun an HTTP Trigger node after a workflow is created and run, you must enable the external scheduling system to resend a scheduling instruction. Rerunning an HTTP Trigger node does not trigger the running of its descendant nodes that are in the Succeeded state.
If you want to obtain the execution results of the descendant nodes of an HTTP Trigger node within a historical period of time after a workflow is created and run, you must backfill data for the descendant nodes. For more information, see Backfill data and view data backfill instances (new version). The external scheduling system does not need to send a scheduling instruction to backfill data. Instead, the HTTP Trigger node directly triggers the data backfill operation for its descendant nodes.
Develop a task on the HTTP Trigger node
Description
The HTTP Trigger node can be run only if the following requirements are met:
Auto triggered instances are generated for the HTTP Trigger node. You can find the instances on the Auto Triggered Instances page in Operation Center. The instances are in the waiting state before the RunTriggerNode operation is successfully called to run the instances. The descendant nodes of the HTTP Trigger node are blocked until the RunTriggerNode operation is called as expected to run the instances generated for the HTTP Trigger node.
All ancestor nodes on which the HTTP Trigger node depends are run as expected. The status of the ancestor nodes is Succeeded.
The scheduling time of the auto triggered instances generated for the HTTP Trigger node arrives.
Sufficient scheduling resources are available for use when the HTTP Trigger node is run.
The status of the HTTP Trigger node is not Freeze.
The HTTP Trigger node can be triggered only if it is in the Pending state. If the HTTP Trigger node is triggered, it cannot be triggered again.
Configure the scheduling properties for an HTTP Trigger node
After you develop an HTTP Trigger node, configure scheduling properties for the HTTP Trigger node. For more information, see Scheduling configurations.
By default, the HTTP Trigger node uses the root node of the workflow as its ancestor node. You must manually change the ancestor node of the HTTP Trigger node to the required node.
Commit and deploy the HTTP Trigger node
After the node code and scheduling settings are configured, deploy the HTTP Trigger node to the production environment. For more information, see Deploy nodes.
After the deployment is complete, go to the Auto Triggered Nodes page in Operation Center to view the node that is deployed and perform O&M operations on the node. The system periodically runs the node based on the scheduling properties that you configure. For more information, see Getting started with Operation Center.
Configure triggers in an external scheduling system
You can use Alibaba Cloud SDK for Java or Python to configure a trigger in an external scheduling system or call an API operation to run an HTTP Trigger node.
Alibaba Cloud SDK for Java
Install Alibaba Cloud SDK for Java. For more information, see Get started with Alibaba Cloud SDK V1.0 for Java.
Specify the following Project Object Model (POM) configurations to use DataWorks SDK for Java:
<dependency> <groupId>com.aliyun</groupId> <artifactId>aliyun-java-sdk-dataworks-public</artifactId> <version>3.4.2</version> </dependency>
Use the sample code and configure the parameters in the code.
You can go to the debugging page of the RunTriggerNode operation and view complete sample code on the SDK Sample Code tab.
// This file is auto-generated, don't edit it. Thanks. package com.aliyun.sample; import com.aliyun.tea.*; public class Sample { /** * <b>description</b> : * <p>Use your AccessKey pair to initialize the client.</p> * @return Client * * @throws Exception */ public static com.aliyun.dataworks_public20200518.Client createClient() throws Exception { // If the project code is leaked, the AccessKey pair may be leaked and the security of all resources within your account may be compromised. The following sample code is provided only for reference. // For enhanced security, we recommend that you use temporary access credentials issued by Security Token Service (STS). For more information, see https://www.alibabacloud.com/help/en/sdk/developer-reference/v2-manage-access-credentials. com.aliyun.teaopenapi.models.Config config = new com.aliyun.teaopenapi.models.Config() // Required. Make sure that the ALIBABA_CLOUD_ACCESS_KEY_ID environment variable is configured. .setAccessKeyId(System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID")) // Required. Make sure that the ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variable is configured. .setAccessKeySecret(System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET")); // For more information about endpoints, see https://api.aliyun.com/product/dataworks-public. config.endpoint = "dataworks.cn-shanghai.aliyuncs.com"; return new com.aliyun.dataworks_public20200518.Client(config); } public static void main(String[] args_) throws Exception { java.util.List<String> args = java.util.Arrays.asList(args_); com.aliyun.dataworks_public20200518.Client client = Sample.createClient(); com.aliyun.dataworks_public20200518.models.RunTriggerNodeRequest runTriggerNodeRequest = new com.aliyun.dataworks_public20200518.models.RunTriggerNodeRequest(); com.aliyun.teautil.models.RuntimeOptions runtime = new com.aliyun.teautil.models.RuntimeOptions(); try { // Write your own code to display the response of the API operation if necessary. client.runTriggerNodeWithOptions(runTriggerNodeRequest, runtime); } catch (TeaException error) { // Handle exceptions with caution in actual business scenarios and do not ignore the exceptions in your project. In this example, exceptions are provided for reference only. // The error message. System.out.println(error.getMessage()); // The URL for troubleshooting. System.out.println(error.getData().get("Recommend")); com.aliyun.teautil.Common.assertAsString(error.message); } catch (Exception _error) { TeaException error = new TeaException(_error.getMessage(), _error); // Handle exceptions with caution in actual business scenarios and do not ignore the exceptions in your project. In this example, exceptions are provided for reference only. // The error message. System.out.println(error.getMessage()); // The URL for troubleshooting. System.out.println(error.getData().get("Recommend")); com.aliyun.teautil.Common.assertAsString(error.message); } } }
Alibaba Cloud SDK for Python
Install Alibaba Cloud SDK for Python. For more information, see Integrate SDK dependencies.
Run the following command to install DataWorks SDK for Python:
pip install aliyun-python-sdk-dataworks-public==2.1.2
Use the sample code and configure the parameters in the code.
You can go to the debugging page of the RunTriggerNode operation and view complete sample code on the SDK Sample Code tab.
# -*- coding: utf-8 -*- # This file is auto-generated, don't edit it. Thanks. import os import sys from typing import List from alibabacloud_dataworks_public20200518.client import Client as dataworks_public20200518Client from alibabacloud_tea_openapi import models as open_api_models from alibabacloud_dataworks_public20200518 import models as dataworks_public_20200518_models from alibabacloud_tea_util import models as util_models from alibabacloud_tea_util.client import Client as UtilClient class Sample: def __init__(self): pass @staticmethod def create_client() -> dataworks_public20200518Client: """ Use your AccessKey ID and AccessKey secret to initialize the client. @return: Client @throws Exception """ # If the project code is leaked, the AccessKey pair may be leaked and the security of all resources within your account may be compromised. The following sample code is provided only for reference. # For security reasons, we recommend that you use temporary access credentials that are provided by Security Token Service (STS). For more information, visit https://www.alibabacloud.com/help/en/sdk/developer-reference/v2-manage-php-access-credentials. config = open_api_models.Config( # Required. Make sure that the ALIBABA_CLOUD_ACCESS_KEY_ID environment variable is configured. , access_key_id=os.environ['ALIBABA_CLOUD_ACCESS_KEY_ID'], # Required. Make sure that the ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variable is configured. , access_key_secret=os.environ['ALIBABA_CLOUD_ACCESS_KEY_SECRET'] ) # For more information about endpoints, see https://api.aliyun.com/product/dataworks-public. config.endpoint = f'dataworks.cn-shanghai.aliyuncs.com' return dataworks_public20200518Client(config) @staticmethod def main( args: List[str], ) -> None: client = Sample.create_client() run_trigger_node_request = dataworks_public_20200518_models.RunTriggerNodeRequest() runtime = util_models.RuntimeOptions() try: # Print the response of the API operation if necessary. client.run_trigger_node_with_options(run_trigger_node_request, runtime) except Exception as error: # Handle exceptions with caution in your actual business scenario and never ignore exceptions in your project. In this example, error messages are printed. # Display error messages. print(error.message) # Show the URL for troubleshooting. print(error.data.get("Recommend")) UtilClient.assert_as_string(error.message) @staticmethod async def main_async( args: List[str], ) -> None: client = Sample.create_client() run_trigger_node_request = dataworks_public_20200518_models.RunTriggerNodeRequest() runtime = util_models.RuntimeOptions() try: # Print the response of the API operation if necessary. await client.run_trigger_node_with_options_async(run_trigger_node_request, runtime) except Exception as error: # Handle exceptions with caution in your actual business scenario and never ignore exceptions in your project. In this example, error messages are printed. # Display error messages. print(error.message) # Show the URL for troubleshooting. print(error.data.get("Recommend")) UtilClient.assert_as_string(error.message) if __name__ == '__main__': Sample.main(sys.argv[1:])
For more information about the API operation, see RunTriggerNode.