All Products
Search
Document Center

DataWorks:Check node

Last Updated:Mar 27, 2026

ETL workflows that depend on external data face a core problem: the data arrives on a variable schedule. A Check node solves this by polling an external object—a file, a table partition, or a real-time sync task—and blocking downstream nodes until the condition is met. When the condition is satisfied, the Check node succeeds and releases its downstream nodes. If the condition is never met before the stop time, the node fails and keeps downstream nodes blocked, preventing them from reading incomplete or missing data.

Typical use case: An external database exports a data file to OSS every morning around 02:00. Your DataWorks ETL workflow needs to process this file immediately after it is generated. Since the export time varies, you configure a Check node to monitor the OSS path. The ETL node starts only after the Check node confirms the file exists.

Supported objects

Object type Constraint
MaxCompute partitioned tables Partitioned tables only
OSS files Must use AccessKey ID and secret; RAM role mode is not supported
FTP files
Hadoop Distributed File System (HDFS) files
OSS-HDFS files
DLF (Paimon) tables Partitioned tables only
Real-time synchronization tasks Kafka-to-MaxCompute tasks only

Limits

  • Edition: DataWorks Professional Edition and higher.

  • Maximum wait time: 24 hours. If the condition is not met within 24 hours, the node fails.

Prerequisites

Before you begin, make sure that you have:

  • Permissions: The RAM user developing the node has the Development or Workspace Administrator role in the workspace

  • Resource group: A serverless resource group associated with the workspace

  • Data sources: The data source for the object you intend to check, configured as follows:

    • MaxCompute: Associated with Data Studio

    • OSS: Created using AccessKey ID and secret (RAM role mode is not supported)

    • FTP, HDFS, OSS-HDFS, and DLF: Create the corresponding data sources in DataWorks. For FTP, see FTP data source

  • Real-time tasks (if applicable): A Kafka-to-MaxCompute sync task

Step 1: Configure the Check node

Double-click the Check node to open the configuration interface, then choose the scenario that matches what you want to monitor.

Check a data source

Use this to wait for a file to be generated or a table partition to be created.

Field Description
Check Object Select Data Source.
Data Source Type/Name Select the data source type (for example, OSS or MaxCompute) and the specific data source name.
Table Name/File Path For MaxCompute and DLF: select the target table. Only partitioned tables are supported. For OSS, FTP, HDFS, and OSS-HDFS: enter the absolute file or directory path.
Condition for Check Passing For tables: whether a partition exists, or whether the last modified time has been unchanged for a specific duration. For files: whether the file exists.
Policy for Stopping Check See Choose a stop policy.

Choose a stop policy

The Policy for Stopping Check field controls when the Check node stops polling if the condition is never met.

Option When to use
Time for Stopping Check (absolute time) Use when downstream nodes must start by a fixed deadline—for example, before the morning ETL window closes.
Checks Allowed Before Check Node Stops (interval x count) Use when you want to control total wait time without tying it to a wall-clock time. Maximum wait time = Interval x Total Checks.
Important

If the Check node starts late due to an upstream delay and the current time is already past the configured stop time, the node runs once. If the condition is met, it succeeds; otherwise, it fails immediately.

Check a real-time sync task

Use this to verify that a Kafka-to-MaxCompute real-time stream has no significant delay before triggering a batch node.

Field Description
Check Object Select Real-time Synchronization Task.
Real-time Synchronization Task Select an existing Kafka-to-MaxCompute task from the dropdown list.
Policy for Stopping Check See Choose a stop policy.

Step 2: Schedule and deploy

  1. Configure dependencies:

    • Set the upstream of the Check node to the root node or a logical start node.

    • Set the downstream of the Check node to the node that requires the data.

    • For details, see Configure node scheduling.

  2. Save and deploy the node to the production environment. Once deployed, the task runs on its configured schedule. Monitor its status in Operation Center.