In addition to timed scheduling, the running of a recurring instance or data backfill instance is influenced by factors such as the status of upstream instance tasks, resource availability, and adherence to rate limiting rules. Dataphin offers an instance running diagnosis feature to analyze the running flow and overall link of the instance, enabling quick problem identification when an instance does not perform as expected.
Limits
This feature supports only offline recurring instances and data backfill instances, including script instances, detail and aggregate table instances, and extraction instances. Real-time instances, such as real-time computing and real-time integration, as well as one-time instances, are not supported.
Field-level analysis is not supported for detail and aggregate table instances; analysis is based solely on materialization nodes.
Feature overview
In the Operation Center, the running status of instances is visually represented by various colors and icons, indicating the stage of the running flow. The stage or reason for an instance's non-running status can be deduced from these visual cues. The running status and flow of instances are detailed as follows:
Running status icon | Icon description | Running flowchart |
Not running | ||
Waiting for scheduling time | ||
Rate limiting | ||
Waiting for scheduling resources | ||
Running | ||
Success | ||
Failed |
An instance's successful run can be influenced by various factors, including upstream dependency, scheduling time, resources, and the instance's own conditions. The running diagnosis feature allows for diagnosis and analysis based on the following flow or dimensions when an instance fails to run or remains in a running status for an extended period:
Check items | Description |
Examine the running status of upstream instances. A failure in an upstream instance can prevent the current instance from running. The upstream dependency diagnosis results offer further insights into the failure. | |
Verify if the task has reached its scheduled running time. | |
Review the rate limiting rules that the current instance has triggered and the list of instances already in the current queue. | |
Assess the duration the instance has been waiting for scheduling resources, the list of instances using resources in the current resource group, and take action based on diagnostic suggestions. | |
Access the instance running results and execution logs. |
Running diagnosis entry
For more information, see Operation Center Entry to navigate to the O&M center page.
On the O&M center page, use the operation guide shown below to access the node details page of the target instance.
Below is an example using the Recurring Instance page.
On the node details page, single click the Diagnosis button.
Upstream dependency
The upstream dependency diagnosis displays the latest running diagnosis results of the instance and the current status of upstream instances. A successful run of all upstream instances is required before the next check is conducted. To refresh the latest running results or the status of upstream instances, single click the refresh icon.
Should the most recent execution of the instance result in Success without being not forced rerun, the diagnostic outcome will be marked as Pass.
Feature
Description
Latest Run
Displays the running status and the time of the successful run.
NoteThe current instance will only start scheduling when all upstream instances have successfully run.
Current Diagnosis Result
Presents the diagnosis outcome.
Schedule Type: Includes Dry-run, Normal Run, and Paused Run. If the instance is in a Paused Run state, you must Resume Scheduling to initiate running.
Starting Blocking Node: Identifies the highest-level node preventing the current node from initiating. Instances that clear the upstream diagnosis have no starting blocking node.
Direct Upstream List: Lists direct upstream dependencies. Supports search by node name, node ID, and instance ID, and filtering by Running Status and Owner.
If the instance has not started running and is not paused scheduling, the diagnosis result's schedule type is Normal Run. Focus on the starting blocking node and adjust the running of the blocking node to resume the current node's running. The current instance will only start scheduling when all upstream instances have successfully run.
If the instance is currently in a paused running status, it will not be further diagnosed, and the diagnosis result is Paused Run.
Instances undergoing a forced rerun do not require a successful run of all upstream instances. If the latest run of the instance is a Forced rerun, the diagnosis result is Skip.
Timed scheduling
The diagnosis result of timed scheduling is the latest running diagnosis result of each instance. Instances need to reach the scheduled running time to start scheduling, otherwise, they will be in the waiting for scheduling time status. If you need to refresh the diagnosis result, you can single click the refresh icon.
If an instance has not yet reached its scheduled time for timed scheduling and is not paused, the diagnosis result will show as Waiting For Scheduling Time. To prevent any impact on downstream data quality, you can initiate a Forced Rerun if early scheduling is required.
Should the instance be in a paused state with paused scheduling, the diagnostic outcome will indicate Paused. To initiate execution, simply Resume Scheduling with a single click.
If the latest run of an instance has met its scheduled timed scheduling and has not been forced to rerun, the diagnosis result will be Pass.
A forced rerun initiates immediately without checking if the instance has reached its timed scheduling, bypassing the diagnosis. If the latest instance run is a Forced rerun, the diagnosis result will be Skip.
Rate limiting rules
With the intelligent operations and maintenance add-on, you can set up rate limiting rules. For more information, see Rate limiting configuration.
All instances need to undergo rate limiting rule diagnosis. After passing the upstream dependency and timed scheduling diagnosis, they need to meet all the hit rate limiting rules before being issued to the resource scheduling system. If you need to refresh the diagnosis result, you can single click the refresh icon.
If an instance's latest run satisfies the upstream dependency and timed scheduling criteria, and adheres to all rate limiting rules, the diagnosis will show as Pass.
An instance currently under rate limiting and not paused scheduling will display a diagnosis of Rate Limiting, along with the current wait time.
Scenario
Description
Blocking Rules
Displays the names of rate limiting rules triggered by the instance. Click on a specific rule name to view details.
Issued Instance List
Lists instances in the queue of the rate limiting rule that the current instance has triggered. You can search or filter these instances by name or ID.
If an instance is in a paused state and paused scheduling, the diagnosis will be Paused. The instance must be resumed to continue with resource scheduling.
Scheduling resources
Instances sharing running resources are generally less affected by scheduling resources. Instances with exclusive running resources need sufficient allocatable idle resources in their resource group to start scheduling, otherwise, they will be in the Waiting for scheduling resources status. If you need to refresh the diagnosis result, you can single click the refresh icon.
If the resource group for the instance's latest run has enough available idle resources and scheduling is not paused, the diagnostic result will be Pass.
If the current instance does not have sufficient allocatable idle scheduling resources, the diagnosis result is Waiting for scheduling resources. The scheduling resources diagnosis page shows the Waiting Resource Duration, diagnostic Suggestions, and the Resource-occupying Instances list. You can handle it based on the diagnostic suggestions displayed on the page and the resource-occupying instances list to ensure that the current instance can obtain sufficient resources to run normally.
Instance execution
When instances reach the execution phase, they are directed to the Instance Execution diagnosis page, which displays the Running Results and Runtime Log. In the event of a Running Results status marked as Failed, the Runtime Log can be used for troubleshooting and resolving issues. To update the diagnosis results, simply click the refresh icon.