Automatic parsing reads the SQL statements in a node's code to infer scheduling dependencies, adding outputs or ancestor nodes automatically without manual configuration. This topic describes how automatic parsing works for each node type and lists the scenarios where it is not supported.
How automatic parsing works
DataWorks scans the SQL statements in a node's code. Based on the SQL keywords it finds, DataWorks determines what data the node reads and writes:
-
Write statements (CREATE, INSERT, and similar) — DataWorks adds an output to the node. The output name identifies the table the node writes to.
-
Read statements (SELECT) — DataWorks adds an ancestor node to the node. The ancestor node name identifies the upstream table being read.
For example, if Node A writes to orders_clean and Node B reads from orders_clean, DataWorks automatically sets Node A as an ancestor of Node B — no manual dependency configuration needed.
Configuration by node type
The following table shows which SQL statements trigger automatic parsing and the format of the output or ancestor node name that DataWorks generates.
| Node type | SQL statement | What DataWorks adds | Output name format |
|---|---|---|---|
| ODPS node | CREATE, INSERT | Output | odps_project_name.table_name |
| ODPS node | SELECT | Ancestor node | project_name.table_name |
| SQL node (non-ODPS) | CREATE, INSERT, ALTER, UPDATE | Output | See the table below |
| SQL node (non-ODPS) | SELECT | Ancestor node | project_name.table_name |
| Batch synchronization node | All | Not supported | Manually configure scheduling dependencies |
Output name formats for non-ODPS SQL nodes:
| Engine | Format |
|---|---|
| E-MapReduce (EMR) | workspace_name.db_name.table_name |
| AnalyticDB for PostgreSQL | workspace_name.db_name.schema_name.table_name |
| AnalyticDB for MySQL | workspace_name.db_name.schema_name.table_name |
| Hologres | workspace_name.db_name.schema_name.table_name |
Name format components:
| Placeholder | Description |
|---|---|
odps_project_name |
The DataWorks workspace to which the ODPS node belongs |
project_name |
The workspace that owns the node generating the table |
workspace_name |
The DataWorks workspace to which the node belongs |
db_name |
The database to which the data is written |
schema_name |
The schema of the node |
table_name |
The name of the generated table |
Limitations
Automatic parsing does not apply in the following scenarios. In these cases, manually add the table to the node's Output.
Unsupported node types
Some node types do not support automatic parsing. For example, batch synchronization nodes and AnalyticDB for PostgreSQL nodes cannot use automatic parsing to configure scheduling dependencies. Tables generated by these nodes must be manually added to the node outputs.
To check whether a specific node type supports automatic parsing, open the node in the DataWorks console and inspect its configuration.
Temporary tables
Temporary tables created by SQL statements are not parsed automatically. For example, if your workspace treats tables with a t_ prefix as temporary tables, those tables cannot be automatically added to Output or Parent Nodes. Manually add them where required.
Tables generated by synchronization nodes
After a synchronization node generates a table, manually add the table as the node output in the projectname.tablename format. Once the output is added, descendant nodes that read from the table can use automatic parsing to configure their own scheduling dependencies.