Python nodes run Python 3 code as scheduled jobs in DataWorks. Use them to automate data processing tasks that need to run on a fixed schedule—from simple transformations to complex batch workflows.
Prerequisites
Before you begin, make sure you have:
-
A RAM user added to your workspace with the Develop or Workspace Administrator role. The Workspace Administrator role grants broader permissions than most development tasks require—assign it with caution. For setup instructions, see Add members to a workspace
-
A serverless resource group associated with your workspace. See Use serverless resource groups
-
A Python node created in a scheduling workflow. See Create a node for a scheduling workflow
Limitations
-
Python version: Python nodes support Python 3 only. Python 2 is not supported.
-
Third-party packages: Python nodes provide a basic runtime environment. To use third-party packages, create a custom image with the required dependencies installed, then configure the node to use that image.
-
Resource group: Debugging and scheduling Python nodes requires a serverless resource group. Make sure your workspace has one attached before running or scheduling the node.
-
Compute units: Tasks on a serverless resource group support a maximum of 64 CU (compute units). To avoid resource shortages at startup, keep your configuration at or below 16 CU.
Step 1: Develop the Python node
-
Write your Python code in the node editor. The following example shows a bubble sort implementation:
def bubble_sort(arr): n = len(arr) # Outer loop: controls each pass through the list for i in range(n): # Inner loop: compares and swaps adjacent elements for j in range(0, n-i-1): # Swap if the current element exceeds the next if arr[j] > arr[j+1]: arr[j], arr[j+1] = arr[j+1], arr[j] return arr if __name__ == "__main__": example_list = [64, 34, 25, 12, 22, 11, 90] sorted_list = bubble_sort(example_list) print("Sorted list:", sorted_list) -
Test the code by clicking the debug configuration panel on the right. Select your resource group and other test settings, then click
Run. -
Configure the scheduling properties for the node to define how often and when the job runs.
-
Save the node.
Step 2: Publish the node and monitor runs
-
Commit and publish the node to the production environment.
-
Once published, the job runs automatically on the configured schedule. To view run status and perform operations, go to Operation Center > Task O&M > Auto Triggered Task O&M > Auto Triggered Tasks. For details, see Get started with Operation Center.
What's next
-
Personal development environment: DataWorks also supports Python development in the personal development environment. See Personal development environment.
-
Role-based execution: Associate a RAM role with the node to run the job with a specific set of permissions, enabling fine-grained access control and security management.