The DataWorks Function Compute node lets you run custom code for various business requirements. It supports periodic scheduling, making it ideal for scheduled tasks. You can also combine this node with other node types to build complete data processing workflows. This topic describes how to create and use a Function Compute node.
Prerequisites
-
Function Compute is activated.
You must activate Function Compute before you can use Function Compute nodes in DataWorks. After you activate Function Compute, familiarize yourself with its product introduction and features to ensure a smooth development process. For more information, see Activate Function Compute and What is Function Compute?.
-
A service for invoking functions has been created.
A service is a basic resource unit in Function Compute. You can perform operations such as authorization, log configuration, and function creation at the service level. Therefore, you must create a service before you develop a function. For more information, see Create a service.
-
A function has been created.
A function is the basic unit for scheduling and execution. It contains the processing logic of your code. You must write code against the Function Compute interface and deploy it as a function. For more information about how to create a function, see Create a function.
Background
Function Compute is an event-driven, fully managed compute service. You do not need to purchase and manage infrastructure such as servers. You only need to write and upload code or images. Function Compute supports the following two types of functions:
-
Event function: Suitable for event-driven models where a function is invoked when an event occurs.
-
HTTP function: Suitable for scenarios such as quickly building web applications.
You can configure the service and function in a Function Compute node and deploy the node to the production environment for periodic execution.
Limitations
-
Feature limitations
DataWorks supports invoking only event functions, not HTTP functions. Therefore, you must use an event function for tasks that you want to schedule periodically. For more information about function types, see Function types.
-
Region limitations
The Function Compute feature is available only in workspaces that reside in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), China (Hong Kong), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Germany (Frankfurt), UK (London), US (Silicon Valley), and US (Virginia).
Usage notes
-
If the service list is empty when you try to select a service, it may be for one of the following reasons:
-
Your account has overdue payments. Top up your account and refresh the node configuration page.
-
The currently logged-in user does not have permissions to retrieve the service list. Contact the owner of your Alibaba Cloud account to grant your RAM user the fc:ListServices permission, or attach the AliyunFCFullAccess policy. After the permissions are granted, refresh the node configuration page and try again. For more information about how to grant permissions, see Grant permissions to a RAM user.
-
-
For functions in a DataWorks Function Compute node, if the runtime exceeds one hour, you must set the invocation method to asynchronous invocation. For more information about asynchronous invocation in Function Compute, see Asynchronous invocation.
-
If you use a RAM user to develop a Function Compute node, you must grant the user the permissions of the following system policies or custom policies.
Policy type
Description
System policy
If you use a system policy, grant the user the
AliyunFCFullAccess,AliyunFCReadOnlyAccess, andAliyunFCInvocationAccesspolicies.Custom policy
If you use a Function Compute custom policy, the following permissions are commonly granted:
-
fc:GetAsyncTask -
fc:StopAsyncTask -
fc:GetService -
fc:ListServices -
fc:GetFunction -
fc:InvokeFunction -
fc:ListFunctions -
fc:GetFunctionAsyncInvokeConfig -
fc:ListServiceVersions -
fc:ListAliases -
fc:GetAlias -
fc:ListFunctionAsyncInvokeConfigs -
fc:GetStatefulAsyncInvocation -
fc:StopStatefulAsyncInvocation
NoteFor more information about Function Compute permission policies, see the following documents:
-
Step 1: Go to the node creation page
-
Log on to the DataWorks console. In the target region, click in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Data Development.
-
Go to the node creation page.
On the DataStudio page, you can create a Function Compute node in one of the following three ways. Method 1: In the top menu bar, click + New > Node > Function Compute. Method 2: In the business process tree on the left, right-click the target business process and choose New Node > Function Compute. Method 3: Expand the target business process on the left, click General > New Node, and select Function Compute.
Step 2: Create and configure the node
-
Create a Function Compute node.
On the node creation page, configure the basic information for the new node, such as the path and name, and then create the node.
-
Configure the parameters for the Function Compute node.
On the node configuration page, select the function to invoke and configure its invocation method and variables. The following table describes the parameters.
Parameter
Description
Select Function
Select the function to invoke for this task. If no function is available, you must create one. For more information, see Create a function.
NoteDataWorks supports invoking only event functions, not HTTP functions. Therefore, you must use an event function for tasks that you want to schedule periodically. For more information about function types, see Function types.
In this example, the
para_service_01_by_time_triggersfunction is selected. When you create this function, select the sample code for a timer-triggered function that is provided by the platform. The code logic is as follows.import json import logging logger = logging.getLogger() def handler(event, context): logger.info('event: %s', event) # Parse the json evt = json.loads(event) triggerName = evt["triggerName"] triggerTime = evt["triggerTime"] payload = evt["payload"] logger.info('triggerName: %s', triggerName) logger.info("triggerTime: %s", triggerTime) logger.info("payload: %s", payload) return 'Timer Payload: ' + payloadFor more sample code for functions, see Sample code.
Select Version or Alias
Select the service version or alias. The default version is LATEST.
-
Service version
Function Compute provides a service-level versioning feature that allows you to publish one or more versions of your service. When you publish a version, Function Compute creates a snapshot of the service, including its configuration, function code, and function configurations. Triggers are not included. Function Compute then automatically assigns a version number to the snapshot for future use. For more information, see Publish a version.
-
Version alias
Function Compute allows you to create an alias for a service version. An alias points to a specific version, which facilitates operations such as publishing, rollbacks, and canary releases. An alias cannot exist independently of a service or version. When you use an alias to access a service or function, Function Compute resolves the alias to the version it points to. The caller does not need to know the specific version. For more information, see Create an alias.
Invocation Method
The following invocation methods are supported:
-
Synchronous invocation: The event directly triggers the function. Function Compute runs the function and waits for a response. After the function is invoked, Function Compute returns the execution result.
-
Asynchronous invocation: Function Compute queues the event request and returns a response immediately, instead of waiting for the request to complete.
-
If a function is time-consuming, resource-intensive, or contains error-prone logic, you can use asynchronous invocation to improve response speed and reliably handle traffic spikes.
-
For Function Compute tasks that run for more than one hour, use asynchronous invocation.
-
Variable
Assign values to the function's variables. These variables correspond to the event payload defined in the section of the Function Compute console.
This example passes the following parameters to the variables in the
para_service_01_by_time_triggersfunction. In the parameters, the${}format is used to define a variable namedbizdate. You must assign a value to this variable in Step 4.{ "payload": "payload1", "triggerTime": "${bizdate}", "triggerName": "triggerName1" } -
-
Optional. Debug and run the Function Compute node.
After you configure the node, click the
icon. You can then specify a resource group for the task and assign constant values to code variables to run a debugging test and verify the node's logic. When you run the test, use the key=valueformat. Use commas (,) to separate multiple parameters.NoteFor more information about how to debug a task, see Debug a task.
-
Configure the scheduling properties for the node.
DataWorks uses scheduling parameters to pass dynamic values to scheduled tasks. After defining a variable such as
${bizdate}in the node configuration, you must assign a value to it in the scheduling properties. This example assigns the date of the previous day to thebizdatevariable. This means DataWorks schedules the node to run one day before the configured run time. For more information about how to configure scheduling parameters, see Configure scheduling parameters. Click the scheduling properties tab in the right-side pane. In the scheduling parameters section, manually add a parameter. Set the parameter name tobizdateand the value to$[yyyymmdd-1]. In the Variable section of the code editor on the left, reference this scheduling parameter by using${bizdate}. For example, set the value of thetriggerTimefield to${bizdate}. Select LATEST for the function version and Synchronous for the invocation method. For more information about the scheduling properties of a node, see Configure the basic properties of a task.
Step 3: Commit and deploy the node
To run a Function Compute node on an automatic schedule, you must first commit and deploy it to the production environment.
-
Save and commit the node.
Click the
and
icons in the toolbar to save and commit the node. In the commit dialog box, enter a change description and, if required, select the options for code review and smoke testing.Note-
You must configure the Rerun attribute property and the Parent Nodes in the scheduling properties before you can commit the node.
-
If code review is enabled, the code of a submitted node must be approved by a reviewer before the node can be deployed. For more information, see Code review.
-
To ensure that the scheduled node runs as expected, we recommend that you perform smoke testing on the task before you deploy it. For more information, see Smoke testing.
-
-
Optional. Deploy the node.
If you are using a workspace in standard mode, you must click Deploy in the upper-right corner to deploy the node after you commit it. For more information, see Workspaces in standard mode and Deploy tasks.
Next steps
-
After a task is committed and deployed, it is scheduled by Operation Center. You can then manage and monitor the task in the DataWorks Operation Center. For more information, see Operation Center.
-
After you master the basic steps to create and use a Function Compute node, you can explore best practices to gain a deeper understanding of the node. For more information, see Dynamically add watermarks to PDFs by using a Function Compute node in DataWorks.