All Products
Search
Document Center

Dataphin:Instance running diagnosis

Last Updated:Jan 21, 2025

In addition to timed scheduling, the running of a recurring instance or data backfill instance is influenced by factors such as the status of upstream instance tasks, resource availability, and adherence to rate limiting rules. Dataphin offers an instance running diagnosis feature to analyze the running flow and overall link of the instance, enabling quick problem identification when an instance does not perform as expected.

Limits

  • This feature supports only offline recurring instances and data backfill instances, including script instances, detail and aggregate table instances, and extraction instances. Real-time instances, such as real-time computing and real-time integration, as well as one-time instances, are not supported.

  • Field-level analysis is not supported for detail and aggregate table instances; analysis is based solely on materialization nodes.

Feature overview

In the Operation Center, the running status of instances is visually represented by various colors and icons, indicating the stage of the running flow. The stage or reason for an instance's non-running status can be deduced from these visual cues. The running status and flow of instances are detailed as follows:

Running status icon

Icon description

Running flowchart

test

Not running

image.png

test

Waiting for scheduling time

test

Rate limiting

test

Waiting for scheduling resources

test

Running

test

Success

test

Failed

An instance's successful run can be influenced by various factors, including upstream dependency, scheduling time, resources, and the instance's own conditions. The running diagnosis feature allows for diagnosis and analysis based on the following flow or dimensions when an instance fails to run or remains in a running status for an extended period:

Check items

Description

Upstream dependency

Examine the running status of upstream instances. A failure in an upstream instance can prevent the current instance from running. The upstream dependency diagnosis results offer further insights into the failure.

Timed scheduling

Verify if the task has reached its scheduled running time.

Rate limiting rules

Review the rate limiting rules that the current instance has triggered and the list of instances already in the current queue.

Scheduling resources

Assess the duration the instance has been waiting for scheduling resources, the list of instances using resources in the current resource group, and take action based on diagnostic suggestions.

Instance execution

Access the instance running results and execution logs.

Running diagnosis entry

  1. For more information, see Operation Center Entry to navigate to the O&M center page.

  2. On the O&M center page, use the operation guide shown below to access the node details page of the target instance.

    Below is an example using the Recurring Instance page. image

  3. On the node details page, single click the Diagnosis button.

    image

Upstream dependency

The upstream dependency diagnosis displays the latest running diagnosis results of the instance and the current status of upstream instances. A successful run of all upstream instances is required before the next check is conducted. To refresh the latest running results or the status of upstream instances, single click the 刷新 refresh icon.

  • Should the most recent execution of the instance result in Success without being not forced rerun, the diagnostic outcome will be marked as Pass. image

    Feature

    Description

    Latest Run

    Displays the running status and the time of the successful run.

    Note

    The current instance will only start scheduling when all upstream instances have successfully run.

    Current Diagnosis Result

    Presents the diagnosis outcome.

    • Schedule Type: Includes Dry-run, Normal Run, and Paused Run. If the instance is in a Paused Run state, you must Resume Scheduling to initiate running.

    • Starting Blocking Node: Identifies the highest-level node preventing the current node from initiating. Instances that clear the upstream diagnosis have no starting blocking node.

    • Direct Upstream List: Lists direct upstream dependencies. Supports search by node name, node ID, and instance ID, and filtering by Running Status and Owner.

  • If the instance has not started running and is not paused scheduling, the diagnosis result's schedule type is Normal Run. Focus on the starting blocking node and adjust the running of the blocking node to resume the current node's running. The current instance will only start scheduling when all upstream instances have successfully run. image

  • If the instance is currently in a paused running status, it will not be further diagnosed, and the diagnosis result is Paused Run.

    image

  • Instances undergoing a forced rerun do not require a successful run of all upstream instances. If the latest run of the instance is a Forced rerun, the diagnosis result is Skip. image

Timed scheduling

The diagnosis result of timed scheduling is the latest running diagnosis result of each instance. Instances need to reach the scheduled running time to start scheduling, otherwise, they will be in the waiting for scheduling time status. If you need to refresh the diagnosis result, you can single click 刷新 the refresh icon.

  • If an instance has not yet reached its scheduled time for timed scheduling and is not paused, the diagnosis result will show as Waiting For Scheduling Time. To prevent any impact on downstream data quality, you can initiate a Forced Rerun if early scheduling is required. image

  • Should the instance be in a paused state with paused scheduling, the diagnostic outcome will indicate Paused. To initiate execution, simply Resume Scheduling with a single click.

    image

  • If the latest run of an instance has met its scheduled timed scheduling and has not been forced to rerun, the diagnosis result will be Pass. image

  • A forced rerun initiates immediately without checking if the instance has reached its timed scheduling, bypassing the diagnosis. If the latest instance run is a Forced rerun, the diagnosis result will be Skip. image

Rate limiting rules

With the intelligent operations and maintenance add-on, you can set up rate limiting rules. For more information, see Rate limiting configuration.

All instances need to undergo rate limiting rule diagnosis. After passing the upstream dependency and timed scheduling diagnosis, they need to meet all the hit rate limiting rules before being issued to the resource scheduling system. If you need to refresh the diagnosis result, you can single click 刷新 the refresh icon.

  • If an instance's latest run satisfies the upstream dependency and timed scheduling criteria, and adheres to all rate limiting rules, the diagnosis will show as Pass. image

  • An instance currently under rate limiting and not paused scheduling will display a diagnosis of Rate Limiting, along with the current wait time. image

    Scenario

    Description

    Blocking Rules

    Displays the names of rate limiting rules triggered by the instance. Click on a specific rule name to view details.

    Issued Instance List

    Lists instances in the queue of the rate limiting rule that the current instance has triggered. You can search or filter these instances by name or ID.

  • If an instance is in a paused state and paused scheduling, the diagnosis will be Paused. The instance must be resumed to continue with resource scheduling.

Scheduling resources

Instances sharing running resources are generally less affected by scheduling resources. Instances with exclusive running resources need sufficient allocatable idle resources in their resource group to start scheduling, otherwise, they will be in the Waiting for scheduling resources status. If you need to refresh the diagnosis result, you can single click 刷新 the refresh icon.

  • If the resource group for the instance's latest run has enough available idle resources and scheduling is not paused, the diagnostic result will be Pass. image

  • If the current instance does not have sufficient allocatable idle scheduling resources, the diagnosis result is Waiting for scheduling resources. The scheduling resources diagnosis page shows the Waiting Resource Duration, diagnostic Suggestions, and the Resource-occupying Instances list. You can handle it based on the diagnostic suggestions displayed on the page and the resource-occupying instances list to ensure that the current instance can obtain sufficient resources to run normally. image

Instance execution

When instances reach the execution phase, they are directed to the Instance Execution diagnosis page, which displays the Running Results and Runtime Log. In the event of a Running Results status marked as Failed, the Runtime Log can be used for troubleshooting and resolving issues. To update the diagnosis results, simply click 刷新 the refresh icon. image