DataWorks DataStudio provides a variety of nodes to meet different data processing needs. You can use data integration nodes for synchronization, engine compute nodes such as MaxCompute SQL, Hologres SQL, and EMR Hive for data cleaning, and general-purpose nodes such as zero load nodes and do-while loop nodes for complex logic processing. These nodes work together to effectively handle various data processing challenges.
Supported node types
The following table lists the node types supported by recurring schedules. The node types supported by one-time tasks or manually triggered workflows may differ. For the most accurate information, refer to the UI.
Node availability varies by DataWorks edition and region. For the most accurate information, refer to the UI.
Node type | Node name | Description | Node code | TaskType |
Notebook | Notebook offers a flexible and interactive platform for data processing and analysis. It makes data processing, exploration, visualization, and model building more efficient by improving intuitiveness, modularity, and interaction. | 1323 | NOTEBOOK | |
Data Integration | Used for recurring batch data synchronization. It supports data synchronization between various disparate data sources in complex scenarios. For more information about the data sources supported by batch synchronization, see Supported data sources and synchronization solutions. | 23 | DI | |
The real-time data synchronization feature in DataWorks lets you sync data changes from a source database to a destination database in real time. This ensures data consistency. You can sync a single table or an entire database. For more information about the data sources supported by real-time synchronization, see Supported data sources and synchronization solutions. | 900 | RI | ||
MaxCompute | Supports recurring scheduling of MaxCompute SQL tasks. MaxCompute SQL tasks use an SQL-like syntax and are suitable for distributed processing of massive datasets (terabyte-scale) where real-time performance is not critical. | 10 | ODPS_SQL | |
An SQL script template is an SQL code template with multiple input and output parameters. It processes data and generates a sink table by filtering, joining, and aggregating data from source tables. During data development, you can create SQL script template nodes and use these predefined components to quickly build data processing flows. This significantly improves development efficiency. | 1010 | COMPONENT_SQL | ||
Combines multiple SQL statements into a single script for compilation and execution. This is ideal for complex query scenarios, such as nested subqueries or multi-step operations. By submitting the entire script at once, a unified execution plan is generated. The job only needs to be queued and executed once, leading to more efficient resource use. | 24 | ODPS_SQL_SCRIPT | ||
Integrates the Python SDK for MaxCompute. This lets you write and edit Python code directly in a PyODPS 2 node to perform data processing and analysis tasks in MaxCompute. | 221 | PY_ODPS | ||
Use a PyODPS 3 node to write MaxCompute jobs directly in Python and configure them for recurring scheduling. | 1221 | PYODPS3 | ||
DataWorks supports running offline Spark jobs (in Cluster mode) based on MaxCompute. | 225 | ODPS_SPARK | ||
Create a MaxCompute MR node and submit it for scheduling to write MapReduce programs using the MapReduce Java API. This lets you process large datasets in MaxCompute. | 11 | ODPS_MR | ||
When you need to accelerate queries on MaxCompute data in Hologres, you can use the MaxCompute metadata mapping feature of the data catalog. This maps MaxCompute table metadata to Hologres, allowing you to use Hologres foreign tables to accelerate queries on MaxCompute data. | - | - | ||
Supports syncing data from a single MaxCompute table to Hologres. This facilitates efficient big data analysis and real-time queries. | - | - | ||
Hologres | A Hologres SQL node supports queries on data in Hologres instances. Hologres and MaxCompute are seamlessly connected at the underlying layer. This lets you use a Hologres SQL node to directly query and analyze large-scale data in MaxCompute using standard PostgreSQL statements without data migration. This provides rapid query results. | 1093 | HOLOGRES_SQL | |
Supports migrating data from a single Hologres table to MaxCompute. | 1070 | HOLOGRES_SYNC_DATA_TO_MC | ||
Provides a one-click feature to import table schemas. This lets you quickly create Hologres foreign tables in batches that have the same schema as MaxCompute tables. | 1094 | HOLOGRES_SYNC_DDL | ||
Provides a one-click data synchronization feature. This lets you quickly sync data from MaxCompute to a Hologres database. | 1095 | HOLOGRES_SYNC_DATA | ||
Serverless Spark | A Spark node based on Serverless Spark, suitable for large-scale data processing. | 2100 | SERVERLESS_SPARK_BATCH | |
An SQL query node based on Serverless Spark. It supports standard SQL syntax and provides high-performance data analysis capabilities. | 2101 | SERVERLESS_SPARK_SQL | ||
Connects to Serverless Spark through the Kyuubi JDBC/ODBC interface to provide a multitenant Spark SQL service. | 2103 | SERVERLESS_KYUUBI | ||
Severless StarRocks | An SQL node based on EMR Serverless StarRocks. It is compatible with the SQL syntax of open source StarRocks and provides extremely fast online analytical processing (OLAP) query analysis and Lakehouse query analysis. | 2104 | SERVERLESS_STARROCKS | |
Large model | Features a built-in engine for data processing, analysis, and mining. It intelligently performs data cleaning and mining based on your natural language instructions. | 2200 | LLM_NODE | |
Flink | Supports using standard SQL statements to define real-time task processing logic. It is easy to use, supports rich SQL, and has powerful state management and fault tolerance. It is compatible with event time and processing time and can be flexibly extended. The node is easy to integrate with systems such as Kafka and HDFS and provides detailed logs and performance monitoring tools. | 2012 | FLINK_SQL_STREAM | |
Lets you use standard SQL statements to define and execute data processing tasks. It is suitable for analyzing and transforming large datasets, including data cleaning and aggregation. The node supports visual configuration and provides an efficient and flexible solution for large-scale batch processing. | 2011 | FLINK_SQL_BATCH | ||
EMR | Use SQL-like statements to read, write, and manage large datasets. This allows for efficient analysis and development of massive log data. | 227 | EMR_HIVE | |
An interactive SQL query engine for fast, real-time queries on petabyte-scale big data. | 260 | EMR_IMPALA | ||
Breaks down large datasets into multiple parallel Map tasks, which significantly improves data processing efficiency. | 230 | EMR_MR | ||
A flexible and scalable distributed SQL query engine that supports interactive analysis of big data using the standard SQL query language. | 259 | EMR_PRESTO | ||
Lets you edit custom Shell scripts to use advanced features such as data processing, calling Hadoop components, and file operations. | 257 | EMR_SHELL | ||
A general-purpose big data analysis engine known for its high performance, ease of use, and wide applicability. It supports complex in-memory computing and is ideal for building large-scale, low-latency data analysis applications. | 228 | EMR_SPARK | ||
Implements a distributed SQL query engine to process structured data and improve job execution efficiency. | 229 | EMR_SPARK_SQL | ||
Used to process high-throughput real-time streaming data. It has a fault tolerance mechanism that can quickly recover failed data streams. | 264 | EMR_SPARK_STREAMING | ||
A distributed SQL query engine suitable for interactive analysis across multiple data sources. | 267 | EMR_TRINO | ||
A distributed and multitenant gateway that provides SQL and other query services for data lake query engines such as Spark, Flink, or Trino. | 268 | EMR_KYUUBI | ||
ADB | Develop and schedule recurring AnalyticDB for PostgreSQL tasks. | 1000090 | - | |
Develop and schedule recurring AnalyticDB for MySQL tasks. | 1000126 | - | ||
Develop and schedule recurring AnalyticDB Spark tasks. | 1990 | ADB_SPARK | ||
Develop and schedule recurring AnalyticDB Spark SQL tasks. | 1991 | ADB_SPARK_SQL | ||
CDH | Use this node if you have deployed a CDH cluster and want to use DataWorks to execute Hive tasks. | 270 | CDH_HIVE | |
A general-purpose big data analysis engine with high performance, ease of use, and wide applicability. Use it for complex in-memory analysis and to build large-scale, low-latency data analysis applications. | 271 | CDH_SPARK | ||
Implements a distributed SQL query engine to process structured data and improve job execution efficiency. | 272 | CDH_SPARK_SQL | ||
Processes ultra-large datasets. | 273 | CDH_MR | ||
This node provides a distributed SQL query engine, which enhances the data analysis capabilities of the CDH environment. | 278 | CDH_PRESTO | ||
The CDH Impala node lets you write and execute Impala SQL scripts, which provides faster query performance. | 279 | CDH_IMPALA | ||
Lindorm | Develop and schedule recurring Lindorm Spark tasks. | 1800 | LINDORM_SPARK | |
Develop and schedule recurring Lindorm Spark SQL tasks. | 1801 | LINDORM_SPARK_SQL | ||
ClickHouse | Performs distributed SQL queries and processes structured data to improve job execution efficiency. | 1301 | CLICK_SQL | |
Data Quality | Configure data quality monitoring rules to monitor the data quality of tables in a data source, for example, to check for dirty data. You can also customize scheduling policies to periodically run monitoring jobs for data validation. | 1333 | DATA_QUALITY_MONITOR | |
The comparison node lets you compare data from different tables in various ways. | 1331 | DATA_SYNCHRONIZATION_QUALITY_CHECK | ||
General | A zero load node is a control node. It is a dry-run node that does not generate data. It is typically used as the root node of a workflow to help you manage nodes and workflows. | 99 | VIRTUAL | |
Used for parameter passing. The node's output passes the result of the last query or output to downstream nodes through the node context. This enables parameter passing across nodes. | 1100 | CONTROLLER_ASSIGNMENT | ||
The Shell node supports standard Shell syntax but does not support interactive syntax. | 6 | DIDE_SHELL | ||
Aggregates parameters from ancestor nodes and passes them to descendant nodes. | 1115 | PARAM_HUB | ||
Triggers the execution of descendant nodes by monitoring an OSS object. | 239 | OSS_INSPECT | ||
Supports Python 3. It lets you retrieve upstream parameters and configure custom parameters through the scheduling parameters in the scheduling configuration. It also lets you pass its own output as parameters to downstream nodes. | 1322 | Python | ||
Merges the running statuses of ancestor nodes to resolve dependency attachment and execution trigger issues for descendant nodes of a branch node. | 1102 | CONTROLLER_JOIN | ||
Evaluates the result of an ancestor node to determine which branch logic to follow. You can use it together with an assignment node. | 1101 | CONTROLLER_BRANCH | ||
Traverses the result set passed by an assignment node. | 1106 | CONTROLLER_TRAVERSE | ||
Loops through a part of the node logic. You can also use it with an assignment node to loop through the results passed by the assignment node. | 1103 | CONTROLLER_CYCLE | ||
Checks whether a target object (MaxCompute partitioned table, FTP file, or OSS file) is active. When the Check node meets the check policy, it returns a successful running status. If there are downstream dependencies, it runs successfully and triggers the downstream tasks. Supported target objects:
| 241 | CHECK_NODE | ||
Used for recurring scheduling of event processing functions. | 1330 | FUNCTION_COMPUTE | ||
Use this node if you want to trigger a task in DataWorks after a task in another CDN mapping system is complete. Note DataWorks no longer supports creating cross-tenant collaboration nodes. If you are using a cross-tenant collaboration node, replace it with an HTTP trigger node, which provides the same capabilities. | 1114 | SCHEDULER_TRIGGER | ||
Lets you specify an SSH data source to remotely access the host connected to that data source from DataWorks and trigger a script to run on the remote host. | 1321 | SSH | ||
A Data Push node can push the query results generated by other nodes in a DataStudio workflow to DingTalk groups, Lark groups, WeCom groups, Teams, and mailboxes by creating a data push destination. | 1332 | DATA_PUSH | ||
MySQL node | The MySQL node lets you develop and schedule recurring MySQL tasks. | 1000125 | - | |
SQL Server | The SQL Server node lets you develop and schedule recurring SQL Server tasks. | 10001 | - | |
Oracle node | The Oracle node lets you develop and schedule recurring Oracle tasks. | 10002 | - | |
PostgreSQL node | The PostgreSQL node lets you develop and schedule recurring PostgreSQL tasks. | 10003 | - | |
StarRocks node | Develop and schedule recurring StarRocks tasks. | 10004 | - | |
DRDS node | Develop and schedule recurring DRDS tasks. | 10005 | - | |
PolarDB MySQL node | Develop and schedule recurring PolarDB for MySQL tasks. | 10006 | - | |
PolarDB PostgreSQL node | The PolarDB PostgreSQL node lets you develop and schedule recurring PolarDB for PostgreSQL tasks. | 10007 | - | |
Doris node | The Doris node lets you develop and schedule recurring Doris tasks. | 10008 | - | |
MariaDB node | The MariaDB node lets you develop and schedule recurring MariaDB tasks. | 10009 | - | |
SelectDB node | The SelectDB node lets you develop and schedule recurring SelectDB tasks. | 10010 | - | |
Redshift node | The Redshift node lets you develop and schedule recurring Redshift tasks. | 10011 | - | |
Saphana node | The Saphana node lets you develop and schedule recurring Saphana tasks. | 10012 | - | |
Vertica node | The Vertica node lets you develop and schedule recurring Vertica tasks. | 10013 | - | |
DM (Dameng) node | The DM node lets you develop and schedule recurring DM tasks. | 10014 | - | |
KingbaseES node | The KingbaseES node lets you develop and schedule recurring KingbaseES tasks. | 10015 | - | |
OceanBase node | The OceanBase node lets you develop and schedule recurring OceanBase tasks. | 10016 | - | |
DB2 node | The DB2 node lets you develop and schedule recurring DB2 tasks. | 10017 | - | |
GBase 8a node | The GBase 8a node lets you develop and schedule recurring GBase 8a tasks. | 10018 | - | |
Algorithm | PAI Designer is a visual modeling tool for building end-to-end machine learning development workflows. | 1117 | PAI_STUDIO | |
PAI DLC is a container-based training service used to run distributed training tasks. | 1119 | PAI_DLC | ||
Generates a PAIFlow node in DataWorks for a PAI knowledge base index workflow. | 1250 | PAI_FLOW | ||
Logic node | The SUB_PROCESS node integrates multiple workflows into a unified whole for management and scheduling. | 1122 | SUB_PROCESS |
Create a node
Create a node for a recurring workflow
If your task needs to run automatically at a specified time, such as hourly, daily, or weekly, you can create an auto triggered task node. You can create a new auto triggered task node, add an inner node to a recurring workflow, or clone an existing node.
Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose in the Actions column.
In the navigation pane on the left, click
to go to the Data Studio page.
Create an auto triggered task node
Click
to the right of the workspace directories and select Create Node. Select a node type.ImportantThe system provides a list of Common Nodes and All Nodes. You can select All Nodes at the bottom to view the complete list. You can also use the search box to quickly find a node or filter by category, such as MaxCompute, Data Integration, or General, to locate and create the node you need.
You can create folders in advance to organize and manage your nodes.
Set the node name, save it, and go to the node editor page.
Create an inner node in a recurring workflow
Create a recurring workflow.
On the workflow canvas, click Create Node in the toolbar. Select a node type for your task and drag it onto the canvas.
Set the node name and save it.
Create a node by cloning
You can use the clone feature to quickly create a new node from an existing one. The cloned content includes the node's Scheduling information, including Scheduling Parameters, Scheduling Time, and Scheduling Dependencies.
In the Workspace Directories in the navigation pane on the left, right-click the node to clone and select Clone from the pop-up menu.
In the dialog box that appears, you can change the node Name and Path, or keep the default values. Then, click OK to start cloning.
After the cloning is complete, you can view the new node in the Workspace Directories.
Create a node for a manually triggered workflow
If your task does not require a recurring schedule but needs to be published to the production environment for manual execution, you can create an inner node in a manually triggered workflow.
Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose in the Actions column.
In the navigation pane on the left, click
to go to the Manually Triggered Workflow page.Create a manually triggered workflow.
In the toolbar at the top of the manually triggered workflow editor page, click Create Internal Node. Select a node type for your task.
Set the node name and save it.
Create a one-time task node
Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose in the Actions column.
In the navigation pane on the left, click
to go to the One-time Task page.In the Manually Triggered Tasks section, click
to the right of Manually Triggered Tasks, select Create Node, and then select the required node type.NoteOne-time tasks only support creating Batch Synchronization, Notebook, MaxCompute SQL, MaxCompute Script, PyODPS 2, MaxCompute MR, Hologres SQL, Python, and Shell nodes.
Set the node name, save it, and go to the node editor page.
Batch edit nodes
When a workflow contains many nodes, opening and editing them individually is inefficient. DataWorks provides the Inner Node List feature, which lets you quickly preview, search, and batch edit all nodes in a list on the right side of the canvas.
How to use
In the toolbar at the top of the workflow canvas, click the Show Internal Node List button to open the panel on the right.

After the panel opens, it displays all nodes in the current workflow as a list.
Code preview and sorting:
For nodes that support code editing, such as MaxCompute SQL, the code editor is expanded by default.
For nodes that do not support code editing, such as zero load nodes, a card is displayed. These nodes are automatically placed at the bottom of the list.
Quick search and location:
Search: In the search box at the top, you can enter a keyword to perform a fuzzy search for a node name.
Synchronized focus: The canvas and sidebar are synchronized. When you select a node on the canvas, the corresponding node is highlighted in the sidebar. Similarly, when you click a node in the sidebar, the canvas automatically focuses on that node.
Online editing:
Operations: The upper-right corner of each node card provides shortcuts such as Load Latest Code, Open Node, and Edit.
Auto-save: After you enter edit mode, your changes are automatically saved when the mouse focus leaves the code block area.
Conflict detection: If another user updates the code while you are editing it, a failure notification appears when you save. This prevents your changes from being accidentally overwritten.
Focus mode:
Select a node and click
in the upper-right corner of the floating window to enable Focus Mode. The sidebar then shows only the selected node, providing more space for code editing.
Version management
You can use the version management feature to restore a node to a specific historical version. This feature also provides tools for viewing and comparing versions, which helps you analyze differences and make adjustments.
In the Workspace Directories in the navigation pane on the left, double-click the target node name to go to the node editor page.
On the right side of the node editor page, click Version. On the Version page, you can view and manage Development History and Deployment History.
View a version:
On the Development History or Deployment History tab, find the node version you want to view.
Click View in the Actions column to open the details page. On this page, you can view the node code and Scheduling information.
NoteYou can view the Scheduling information in Code Editor or visualization mode. You can switch the view mode in the upper-right corner of the Scheduling tab.
Compare versions:
You can compare different versions of a node on the Development History or Deployment History tab.
Compare versions in the development or deployment environment: On the Development History tab, select two versions and click Select Versions to Compare at the top. You can then compare the node code and scheduling configuration of the different versions.
Compare versions between the development and deployment or build environments:
On the Development History tab, locate a version of the node.
Click Compare in the Actions column. On the details page, you can select a version from the Deployment History or Build History to compare.
Restore to a version:
You can restore a node to a specific historical version only from the Development History tab. On the Development History tab, find the target version and click Restore in the Actions column. The node's code and Scheduling information are then restored to the selected version.
References
For more information about node development in recurring and manual workflows, see Recurring workflows and Manually triggered workflows.
After a node is created and developed, you can publish it to the production environment. For more information, see Node scheduling and Publish nodes and workflows.
FAQ
Can I download node code, such as SQL or Python, to a local machine?
Answer: A direct download feature is not available. As a workaround, you can copy the code to your local machine during development. Alternatively, in the new DataStudio, you can add a local file in your personal folder for development. After development, you can submit the code to the workspace directories. In this case, your code is saved locally.