Before configuring scheduling dependencies for a node, confirm the lineage of the table that the node generates. Table lineage tells you which partition an ancestor node writes to on a given day, so you can verify that the partition your descendant node expects is the partition that will be available at runtime.
This topic covers:
-
How to confirm table lineage when both nodes are in the same workspace or in different workspaces
-
What goes wrong when scheduling dependencies are misconfigured
How scheduling dependencies work
DataWorks scheduling dependencies operate on two distinct axes. Keeping them separate prevents a common source of confusion:
| Axis | Relationship | Example |
|---|---|---|
| DAG structure | Upstream vs. downstream within a single scheduling cycle | Node A depends on Node B |
| Time cycle | Current cycle vs. previous cycle | Today's instance depends on yesterday's output |
A scheduling dependency encodes the DAG-structure relationship. A cross-cycle scheduling dependency encodes the time-cycle relationship. These are separate configurations and have different effects on which partition data a node reads.
Usage notes
In DataWorks, the partitions a node reads from or writes to are determined by the scheduling parameters configured for that node. If the partition written by an ancestor node does not match the partition expected by a descendant node, you have two options based on your business requirements:
-
Modify the scheduling parameter configuration of the descendant node to align with the ancestor node's output partition.
-
Configure cross-cycle scheduling dependencies for the descendant node.
Cross-cycle scheduling dependencies: If the instance for Node A in the current cycle must depend on partition data generated by Node B in the previous cycle, configure cross-cycle scheduling dependencies for Node A. This makes the instance for Node A in the current cycle depend on the instance for Node B in the previous cycle, rather than the current-cycle instance of Node B.
In partitioned table scenarios, make sure the partitions in the table generated by a node match the partitions the current node expects to read.
Confirm table lineage
Confirm lineage when both nodes are in the same workspace
Nodes periodically write data to specific partitions based on their scheduling parameters. For details on how scheduling parameters are dynamically replaced, see Scheduling parameters.
If Node A depends on Node B in the same workspace, check the scheduling parameter configuration of Node B to determine which partition it writes to each day:
-
Development environment: Go to the configuration tab of the ancestor node (Node B). View the scheduling parameter configuration and the node's code to determine the partition output.
-
Production environment: Confirm the ancestor and descendant table data output in the production environment.
Compare the replacement results of the scheduling parameters of Nodes A and B to verify that the partition Node B writes to each day is the partition Node A expects to read.
Confirm lineage when nodes are in different workspaces
If Node A depends on Node B in a different workspace, use DataWorks Data Map (DataMap) to confirm the partition data that Node B generates each day.
In DataMap, view the output information for Node B's table to confirm the data timestamp of the daily partition — specifically, whether the partition written each day corresponds to the previous day's date or the current day's date.
Failure scenarios
Misconfigured scheduling dependencies produce two distinct failure patterns.
Scenario 1: Lineage exists but the dependency is not configured
If Job_B reads from Table A in its SELECT statement, but Job_A — which generates Table A — is not configured as an ancestor node of Job_B, then Job_B may start before Table A is available. When this happens, Job_B fails to run or produce output.
Even if the scheduled start time of Job_A is earlier than Job_B, if Job_A cannot generate data before 02:00, an error occurs when Job_B obtains data. Job_A may fail to generate data at 01:00 as expected due to any of the following reasons:
-
An ancestor node of
Job_Afails or runs slowly. -
Job_Aor one of its ancestor nodes is waiting for resources. -
An ancestor node of
Job_Ais frozen on that day.
Fix: Configure Job_A as an ancestor node of Job_B so that Job_B only starts after Job_A completes successfully.
Scenario 2: Dependency is configured but the partition does not match
If same-cycle scheduling dependencies are configured but the partition written by the ancestor node does not match the partition the descendant node expects, data quality issues may occur or the descendant node may return an error when it reads from the ancestor node's table.
When a MaxCompute node uses the max_pt function, the partition data generated by the ancestor node each day must be valid.
Fix: Verify partition alignment between ancestor and descendant nodes as described in Confirm table lineage. If the partitions do not align, adjust the scheduling parameter configuration of the descendant node or configure cross-cycle scheduling dependencies.