Python nodes support only Python 3 syntax. You can use Python nodes to run Python code and schedule jobs to run periodically. This topic describes how to configure and schedule Python tasks in DataWorks.
Prerequisites
The RAM user that you want to use is added to your workspace.
If you want to use a RAM user to develop tasks, you must add the RAM user to your workspace as a member and assign the Develop or Workspace Administrator role to the RAM user. The Workspace Administrator role has more permissions than necessary. Exercise caution when you assign the Workspace Administrator role. For more information about how to add a member and assign roles to the member, see Add members to a workspace.
A serverless resource group is associated with your workspace. For more information, see the topics in the Use serverless resource groups directory.
A Python node must be created. For more information, see Create a node for a scheduling workflow.
Usage notes
A task that runs on a serverless resource group supports a maximum configuration of
64 CU. To prevent resource shortages that can affect task startup, limit the configuration to16 CU.A Python node provides only a basic runtime environment for Python code. To reference a third-party package in your Python code, create a custom image, install the required dependencies in the image, and then use the image to run the Python node.
Step 1: Develop a Python node
Write the code for the Python node.
Edit the Python code.
The following code shows a simple bubble sort example.
def bubble_sort(arr): n = len(arr) # The outer loop controls each traversal. for i in range(n): # The inner loop compares and swaps adjacent elements. for j in range(0, n-i-1): # If the current element is greater than the next element, swap them. if arr[j] > arr[j+1]: arr[j], arr[j+1] = arr[j+1], arr[j] return arr # Test code. if __name__ == "__main__": example_list = [64, 34, 25, 12, 22, 11, 90] sorted_list = bubble_sort(example_list) print("Sorted list:", sorted_list)
After developing your code, you can test it by clicking the debug configuration on the right, selecting the required test configurations, such as the resource group, and then clicking the
Run button.NoteYou can debug and schedule Python nodes only using serverless resource groups. Make sure that the current workspace is attached to a serverless resource group. For more information, see Use serverless resource groups.
After you develop and test the Python node script, configure the scheduling properties for the node to run periodically.
After you configure scheduling for the task, save the node.
Step 2: Publish and maintain the node
After you configure scheduling, commit and publish the Python node to the production environment.
The published task runs periodically based on the configured schedule. You can go to to view the published auto triggered tasks and perform O&M operations. For more information, see Get started with Operation Center.
Develop tasks in the personal development environment
The personal development environment supports Python programming. For more information about how to use the personal development environment to edit a Python node task, see Personal development environment.
Run a node using an associated role
You can configure a role to be associated with a node and use a specific RAM role to run the node task. This allows for fine-grained permission control and security management.