DataWorks Data Studio provides various nodes for different data processing tasks: data integration nodes for synchronization; engine compute nodes such as MaxCompute SQL, Hologres SQL, and EMR Hive for data cleansing; and general-purpose nodes such as virtual nodes and do-while loop nodes for complex logic processing. These nodes work together to effectively address various data processing challenges.
Supported node types
The following table lists the node types supported by periodic scheduling. Supported node types for manual tasks or manually triggered workflows may differ. For the most up-to-date list, refer to the UI.
-
Node availability varies by edition and region. For the most accurate information, see the UI.
-
Some nodes cannot be run in a workflow. See the node details for specifics.
|
Node type |
Node name |
Description |
Node code |
TaskType |
|
Data Integration |
Synchronizes data in recurring batches between various data sources, supporting data synchronization across multiple heterogeneous data sources in complex scenarios. For more information about the data sources supported by batch synchronization, see Supported data sources and synchronization solutions. |
23 |
DI |
|
|
Synchronizes data changes from a source to a destination database in real time. You can synchronize a single table or an entire database to maintain data consistency. For more information about the data sources supported by real-time synchronization, see Supported data sources and synchronization solutions. |
900 |
RI |
||
|
Notebook |
Notebook provides an interactive and flexible data processing and analysis platform. By enhancing intuitiveness, modularity, and interactive experience, it makes data processing, exploration, visualization, and model building more efficient and convenient. |
1323 |
NOTEBOOK |
|
|
MaxCompute |
Supports periodic scheduling of MaxCompute SQL tasks. MaxCompute SQL uses SQL-like syntax and is suitable for distributed processing scenarios that involve large-scale data (TB-level) but do not require high real-time performance. |
10 |
ODPS_SQL |
|
|
An SQL component is a reusable SQL code template with multiple input and output parameters. It can process data by filtering, joining, and aggregating data source tables to generate result tables. During data development, you can create SQL component nodes and use these predefined components to quickly build data processing pipelines, significantly improving development efficiency. |
1010 |
COMPONENT_SQL |
||
|
Allows you to combine multiple SQL statements into a single script for unified compilation and execution. This is ideal for complex query scenarios such as nested subqueries or multi-step operations. By submitting the entire script at once and generating a unified execution plan, the job only needs to be queued and executed once, making resource utilization more efficient. |
24 |
ODPS_SQL_SCRIPT |
||
|
By integrating the MaxCompute Python SDK, you can write and edit Python code directly on PyODPS 2 nodes to conveniently perform data processing and analysis tasks in MaxCompute. |
221 |
PY_ODPS |
||
|
With PyODPS 3 nodes, you can write MaxCompute jobs directly in Python code and configure these jobs for periodic scheduling. |
1221 |
PYODPS3 |
||
|
Supports running MaxCompute-based Spark batch jobs (cluster mode) on the DataWorks platform. |
225 |
ODPS_SPARK |
||
|
By creating a MaxCompute MR node and submitting it for task scheduling, you can use the MapReduce Java API to write MapReduce programs for processing large-scale datasets in MaxCompute. |
11 |
ODPS_MR |
||
|
When you need to accelerate queries on MaxCompute data in Hologres, you can use the MaxCompute metadata mapping feature of Data Catalog to map MaxCompute table metadata to Hologres, enabling accelerated queries on MaxCompute data through Hologres external tables. |
- |
- |
||
|
Supports synchronizing single-table data from MaxCompute to Hologres for efficient big data analysis and real-time queries. |
- |
- |
||
|
Hologres |
Hologres SQL nodes support querying data in Hologres instances. In addition, Hologres and MaxCompute are seamlessly connected at the underlying level, allowing you to use standard PostgreSQL statements in Hologres SQL nodes to query and analyze large-scale data in MaxCompute without migrating data, delivering fast query results. |
1093 |
HOLOGRES_SQL |
|
|
Supports migrating single-table data from Hologres to MaxCompute. |
1070 |
HOLOGRES_SYNC_DATA_TO_MC |
||
|
Provides a one-click table schema import feature to quickly create Hologres external tables in batches that are consistent with MaxCompute table schemas. |
1094 |
HOLOGRES_SYNC_DDL |
||
|
Provides a one-click MaxCompute data synchronization node to quickly synchronize data from MaxCompute to a Hologres database. |
1095 |
HOLOGRES_SYNC_DATA |
||
|
Serverless Spark |
A Spark node based on Serverless Spark, suitable for large-scale data processing. |
2100 |
SERVERLESS_SPARK_BATCH |
|
|
An SQL query node based on Serverless Spark that supports standard SQL syntax and provides high-performance data analysis capabilities. |
2101 |
SERVERLESS_SPARK_SQL |
||
|
Connects to Serverless Spark through the Kyuubi JDBC/ODBC interface, providing multi-tenant Spark SQL services. |
2103 |
SERVERLESS_KYUUBI |
||
|
Severless StarRocks |
An SQL node based on EMR Serverless StarRocks that is compatible with open-source StarRocks SQL syntax, providing ultra-fast OLAP query analysis and lakehouse query analysis. |
2104 |
SERVERLESS_STARROCKS |
|
|
LLM |
Features a built-in powerful data processing and analysis engine that intelligently performs data cleansing and mining based on your natural language instructions. |
2200 |
LLM_NODE |
|
|
Flink |
Supports defining real-time task processing logic using standard SQL statements. It offers ease of use, rich SQL support, powerful state management and fault tolerance, compatibility with event time and processing time, and flexible scalability. This node integrates easily with systems such as Kafka and HDFS, and provides comprehensive logging and performance monitoring tools. |
2012 |
FLINK_SQL_STREAM |
|
|
Allows you to define and execute data processing tasks using standard SQL statements. It is suitable for analysis and transformation of large datasets, including data cleansing and aggregation. This node supports visual configuration and provides an efficient and flexible large-scale batch data processing solution. |
2011 |
FLINK_SQL_BATCH |
||
|
Supports running Flink real-time tasks by submitting JAR packages. You can select an uploaded Flink JAR resource as the job entry point and configure the entry class and runtime parameters. |
2016 |
FLINK_JAR_STREAM |
||
|
Supports running Flink batch processing tasks by submitting JAR packages. You can select an uploaded Flink JAR resource as the job entry point and configure the entry class and scheduling parameters. |
2015 |
FLINK_JAR_BATCH |
||
|
Supports running Flink real-time tasks by submitting Python files. You can select an uploaded Flink Python resource as the file address and configure the entry module and runtime parameters. |
2018 |
FLINK_PYTHON_STREAM |
||
|
Supports running Flink batch processing tasks by submitting Python files. You can select an uploaded Flink Python resource as the file address and configure the entry module and scheduling parameters. |
2017 |
FLINK_PYTHON_BATCH |
||
|
EMR |
Allows you to use SQL-like statements to read, write, and manage large datasets, enabling efficient analysis and development of massive log data. |
227 |
EMR_HIVE |
|
|
A fast, real-time interactive SQL query engine for PB-scale big data. |
260 |
EMR_IMPALA |
||
|
Breaks down large-scale datasets into multiple parallel Map tasks to significantly improve data processing efficiency. |
230 |
EMR_MR |
||
|
A flexible, scalable distributed SQL query engine that supports interactive analysis and querying of big data using standard SQL query syntax. |
259 |
EMR_PRESTO |
||
|
Allows you to write and execute custom Shell scripts for advanced features such as data processing, invoking Hadoop components, and file operations. |
257 |
EMR_SHELL |
||
|
A general-purpose big data analytics engine known for its high performance, ease of use, and broad applicability. It supports complex in-memory computing and is ideal for building large-scale, low-latency data analytics applications. |
228 |
EMR_SPARK |
||
|
Processes structured data using a distributed SQL query engine to improve job execution efficiency. |
229 |
EMR_SPARK_SQL |
||
|
Processes high-throughput real-time streaming data with fault tolerance mechanisms that can quickly recover from data stream errors. |
264 |
EMR_SPARK_STREAMING |
||
|
A distributed SQL query engine suitable for interactive analysis and querying across multiple data sources. |
267 |
EMR_TRINO |
||
|
A distributed and multi-tenant gateway that provides SQL query services for data lake query engines such as Spark, Flink, and Trino. |
268 |
EMR_KYUUBI |
||
|
ADB |
Supports the development and periodic scheduling of AnalyticDB for PostgreSQL tasks. |
1000090 |
- |
|
|
Supports the development and periodic scheduling of AnalyticDB for MySQL tasks. |
1000126 |
- |
||
|
Supports the development and periodic scheduling of AnalyticDB Spark tasks. |
1990 |
ADB_SPARK |
||
|
Supports the development and periodic scheduling of AnalyticDB Spark SQL tasks. |
1991 |
ADB_SPARK_SQL |
||
|
CDH |
For users who have deployed a CDH cluster and want to run Hive tasks through DataWorks. |
270 |
CDH_HIVE |
|
|
A general-purpose big data analytics engine known for its high performance, ease of use, and broad applicability. It can be used for complex in-memory analysis and building large-scale, low-latency data analytics applications. |
271 |
CDH_SPARK |
||
|
Processes structured data using a distributed SQL query engine to improve job execution efficiency. |
272 |
CDH_SPARK_SQL |
||
|
Processes ultra-large-scale datasets. |
273 |
CDH_MR |
||
|
Provides a distributed SQL query engine that further enhances the data analysis capabilities of the CDH environment. |
278 |
CDH_PRESTO |
||
|
CDH Impala nodes allow you to write and execute Impala SQL scripts, providing faster query performance. |
279 |
CDH_IMPALA |
||
|
Lindorm |
Supports the development and periodic scheduling of Lindorm Spark tasks. |
1800 |
LINDORM_SPARK |
|
|
Supports the development and periodic scheduling of Lindorm Spark SQL tasks. |
1801 |
LINDORM_SPARK_SQL |
||
|
Click House |
Supports distributed SQL queries and structured data processing to improve job execution efficiency. |
1301 |
CLICK_SQL |
|
|
Data Quality |
Allows you to configure data quality monitoring rules to monitor the data quality of related data source tables (for example, checking for dirty data). You can also customize scheduling policies to periodically run monitoring tasks for data validation. |
1333 |
DATA_QUALITY_MONITOR |
|
|
The comparison node supports multiple methods for comparing data across different tables. |
1331 |
DATA_SYNCHRONIZATION_QUALITY_CHECK |
||
|
General |
A virtual node is a control-type node that performs a dry run without generating any data. It is typically used as the root node for workflow orchestration, making it easier to manage nodes and workflows. |
99 |
VIRTUAL |
|
|
Used for parameter passing. It uses its built-in output to pass the last query or output result of the assignment node to downstream nodes through the node context feature, enabling cross-node parameter passing. |
1100 |
CONTROLLER_ASSIGNMENT |
||
|
Shell nodes support standard Shell syntax but do not support interactive syntax. |
6 |
DIDE_SHELL |
||
|
Used for aggregating parameters from upstream nodes and distributing them downstream. |
1115 |
PARAM_HUB |
||
|
Triggers downstream node execution by monitoring OSS objects. |
239 |
OSS_INSPECT |
||
|
Supports the Python 3.0 language. It can obtain upstream parameters through scheduling parameters in schedule settings and apply custom parameters, as well as pass its own output as parameters to downstream nodes. |
1322 |
PYTHON |
||
|
Used for merging the running status of upstream nodes, resolving dependency mounting and run triggering issues for nodes downstream of branch nodes. |
1102 |
CONTROLLER_JOIN |
||
|
Used for evaluating upstream results and directing different outcomes to different branch logic. You can use it together with assignment nodes. |
1101 |
CONTROLLER_BRANCH |
||
|
Used for iterating over the result set passed by an assignment node. |
1106 |
CONTROLLER_TRAVERSE |
||
|
Used for executing a subset of node logic in a loop. You can also use it together with assignment nodes to loop through the results passed by an assignment node. |
1103 |
CONTROLLER_CYCLE |
||
|
Used for checking whether a target object (MaxCompute partitioned table, FTP file, or OSS file) is available. When the check node meets the check policy, it returns a success status. If downstream dependencies exist, it triggers downstream task execution upon success. Supported target objects:
|
241 |
CHECK_NODE |
||
|
Used for periodically scheduling and processing event functions. |
1330 |
FUNCTION_COMPUTE |
||
|
If you want tasks on other scheduling systems to trigger DataWorks tasks upon completion, you can use this node. Note DataWorks no longer supports creating cross-tenant nodes. If you are using cross-tenant nodes, we recommend that you switch to HTTP trigger nodes, which provide the same capabilities. |
1114 |
SCHEDULER_TRIGGER |
||
|
Allows DataWorks to remotely access a host connected through a specified SSH data source and trigger script execution on the remote host. |
1321 |
SSH |
||
|
A Data Push node can push data query results generated by other nodes in a Data Studio workflow to DingTalk groups, Lark groups, WeCom groups, Teams, and email by creating data push targets. |
1332 |
DATA_PUSH |
||
|
MySQL node |
MySQL nodes support the development and periodic scheduling of MySQL tasks. |
1000125 |
- |
|
|
SQL Server |
SQL Server nodes support the development and periodic scheduling of SQL Server tasks. |
10001 |
- |
|
|
Oracle node |
Oracle nodes support the development and periodic scheduling of Oracle tasks. |
10002 |
- |
|
|
PostgreSQL node |
PostgreSQL nodes support the development and periodic scheduling of PostgreSQL tasks. |
10003 |
- |
|
|
StarRocks node |
Supports the development and periodic scheduling of StarRocks tasks. |
10004 |
- |
|
|
DRDS node |
Supports the development and periodic scheduling of DRDS tasks. |
10005 |
- |
|
|
PolarDB MySQL node |
Supports the development and periodic scheduling of PolarDB MySQL tasks. |
10006 |
- |
|
|
PolarDB PostgreSQL node |
PolarDB PostgreSQL nodes support the development and periodic scheduling of PolarDB PostgreSQL tasks. |
10007 |
- |
|
|
Doris node |
Doris nodes support the development and periodic scheduling of Doris tasks. |
10008 |
- |
|
|
MariaDB node |
MariaDB nodes support the development and periodic scheduling of MariaDB tasks. |
10009 |
- |
|
|
SelectDB node |
SelectDB nodes support the development and periodic scheduling of SelectDB tasks. |
10010 |
- |
|
|
Redshift node |
Redshift nodes support the development and periodic scheduling of Redshift tasks. |
10011 |
- |
|
|
Saphana node |
Saphana nodes support the development and periodic scheduling of SAP HANA tasks. |
10012 |
- |
|
|
Vertica node |
Vertica nodes support the development and periodic scheduling of Vertica tasks. |
10013 |
- |
|
|
DM (Dameng) node |
DM nodes support the development and periodic scheduling of DM tasks. |
10014 |
- |
|
|
KingbaseES node |
KingbaseES nodes support the development and periodic scheduling of KingbaseES tasks. |
10015 |
- |
|
|
OceanBase node |
OceanBase nodes support the development and periodic scheduling of OceanBase tasks. |
10016 |
- |
|
|
DB2 node |
DB2 nodes support the development and periodic scheduling of DB2 tasks. |
10017 |
- |
|
|
GBase 8a node |
GBase 8a nodes support the development and periodic scheduling of GBase 8a tasks. |
10018 |
- |
|
|
Algorithm |
PAI's visual modeling tool, Designer, for implementing end-to-end machine learning development workflows with visual modeling. |
1117 |
PAI_STUDIO |
|
|
PAI's container-based training service, DLC, for distributed execution of training tasks. |
1119 |
PAI_DLC |
||
|
PAI knowledge base index workflow / generates PAIFlow nodes on the DataWorks side. |
1250 |
PAI_FLOW |
||
|
Logic node |
The SUB_PROCESS node consolidates multiple workflows into a unified whole for management and scheduling. |
1122 |
SUB_PROCESS |
Create nodes
Create nodes for scheduled workflows
If your tasks need to run automatically at specified intervals (such as hourly, daily, or weekly), you can create scheduled task nodes in the following ways: create a scheduled task node, add internal nodes to a scheduled workflow, or clone an existing node to create a new one.
Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose in the Actions column.
-
In the left-side navigation pane, click
to go to the Data Studio page.
Create a scheduled task node
-
Click
on the right side of the project directory, select New Node, and then select the desired node type.ImportantThe system provides a Common Nodes list and an All Nodes list. Select All Nodes at the bottom to view all available node types. Use the search box to quickly find nodes, or use category filters (such as MaxCompute, Data Integration, and General) to locate and create the desired node.
You can create directories in advance to organize and manage nodes.
-
Set the node name and save it. The node editing page then appears.
Create internal nodes in a scheduled workflow
-
Create a scheduled workflow.
-
On the workflow canvas, click New Node in the toolbar at the top, select the desired node type based on the task you need to develop, and drag it onto the canvas.
-
Set the node name and save it.
Create a node by cloning
Use the clone feature to quickly clone an existing node and create a new one. The cloned content includes the node's Scheduling Settings information (Scheduling Parameters, Scheduling time, and Scheduling Dependency).
-
In the left-side Project Directory, right-click the node you want to clone and select Cloning from the context menu.
-
In the dialog, modify the node Name and Path (or keep the default values), and click Confirm to start cloning.
-
After cloning is complete, view the newly created node in the Project Directory.
Create nodes for manually triggered workflows
If your tasks do not need to run periodically but need to be deployed to the production environment and run manually when needed, you can create internal nodes in a manually triggered workflow.
Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose in the Actions column.
-
In the left-side navigation pane, click
to go to the manually triggered workflow page.-
Create a manually triggered workflow.
-
On the toolbar at the top of the manually triggered workflow editing page, click New Internal Node, and select the desired node type based on the task you need to develop.
-
Set the node name and save it.
-
Create manual task nodes
Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose in the Actions column.
-
In the left-side navigation pane, click
to go to the manual task page. -
In the lower section, click
on the right side of Manually Triggered Task, select New Node, and then select the desired node type.NoteManual tasks only support the following node types: Offline synchronization, Notebook, Maxcompute SQL, Maxcompute Script, Pyodps 2, Maxcompute MR, Hologres SQL, Python, and Shell.
-
Set the node name and save it. The node editing page then appears.
Batch editing of nodes
When a workflow contains a large number of nodes, opening them one by one for editing is inefficient. DataWorks provides an Internal Node List feature that displays all nodes in a list on the right side of the canvas for quick preview, search, and batch editing.
Usage
-
On the toolbar at the top of the workflow canvas, click the Show Internal Node List button to open the feature panel on the right side of the canvas.

-
After the panel opens, all nodes in the current workflow are displayed in a list.
-
Code preview and sorting:
-
Nodes that support code editing (such as MaxCompute SQL) expand the code editor by default.
-
Nodes that do not support code editing (such as virtual nodes) are displayed as cards and are automatically arranged at the bottom of the list.
-
-
Quick search and navigation:
-
Search: Enter keywords in the search box at the top to perform a fuzzy search on node names.
-
Linkage: Bidirectional linkage is available between the canvas and the sidebar. Selecting a node on the canvas highlights the corresponding node in the sidebar, and vice versa.
-
-
Online editing:
-
Actions: The upper-right corner of each node card provides quick actions such as Load Latest Code, Open Node, and Edit.
-
Auto-save: After you enter the editing state, changes are automatically saved when the mouse focus leaves the code block area.
-
Conflict detection: If the code is updated by another user during editing, a save failure notification is triggered to prevent accidental overwrites.
-
-
Focus mode:
-
Select a node and click
in the upper-right corner of the floating window to enable Focus Mode. The sidebar displays only the currently selected node, providing more space for code editing.
-
-
Version management
The system supports restoring nodes to a specified historical version through version management. It also provides version viewing and comparison features to help you analyze differences and make adjustments.
-
In the left-side Project Directory, double-click the target node name to go to the node editing page.
-
Click Version on the right side of the node editing page. On the Version page, view and manage Developer Record and Publish Record information.
-
View a version:
-
On the Developer Record or Publish Record tab, find the node version you want to view.
-
Click View in the Operation column to go to the details page where you can view the node code content and Scheduling Settings information.
NoteScheduling Settings information can be viewed in Script Mode or Visual Mode. You can switch between the viewing modes in the upper-right corner of the Scheduling Settings tab.
-
-
Compare versions:
On the Developer Record or Publish Record tab, you can compare different versions of a node. The following example uses the developer record to demonstrate the comparison operation.
-
Compare within development or deployment records: On the Developer Record tab, select two versions and click the Select Comparison button at the top to compare the node code content and schedule settings between versions.
-
Compare across development and deployment or build records:
-
On the Developer Record tab, locate the desired version of the node.
-
Click Compare in the Operation column, and on the details page, select a version from Publish Record or Build Records to compare.
-
-
-
Restore a version:
You can only restore nodes from the Developer Record to a specified historical version. On the Developer Record tab, find the target version and click Restore in the Operation column to restore the node's code and Scheduling Settings information to the target version.
-
References
-
For more information about developing nodes in scheduled workflows and manually triggered workflows, see Scheduled workflows and Manually triggered workflows.
-
After nodes are created and developed, you can deploy them to the production environment. For more information, see Submit nodes and Deploy nodes.
FAQ
Can I download node code (such as SQL or Python) to my local machine?
-
Answer: A direct download feature is not available. As an alternative, you can copy the code to your local machine directly during development. Alternatively, you can develop in the personal directory in Data Studio, and then submit the code to the project directory. In this case, your code is saved locally.