Operation Center for Centralized Task Monitoring & Control - Dataphin

The Operation Center is where tasks from Data Integration and Data Development modules are managed post-submission or publication, in both development and production environments. It offers five key functional modules: operation overview, task operation, instance operation, monitoring management, and system configuration, enabling comprehensive management of tasks and their instances.

Scenarios

Global Perspective Control: The Dataphin Operation Center provides a global view of instance statistics for both offline and real-time instances. It includes detailed reports on running trends, rankings of failed instances, and alert instances, along with statistics on running status and duration, failure counts, delay durations, and alert rankings. This comprehensive overview helps synchronize abnormal information promptly and enhances operational efficiency.
Resource Cost Savings: The resource dashboard in the Dataphin Operation Center compares allocated versus actual CPU and memory usage for both overall and individual tasks. This analysis aids in optimizing global resource configuration and individual task resource allocation, allowing for cost savings and improved resource utilization without compromising task stability.
Task Operation Management: The center supports managing code tasks from Data Integration, Modeling R&D, Coding R&D, and Data Distilling modules. This includes monitoring the status of single nodes and managing their upstream and downstream dependencies.
Running Resource Control: To address compute engine performance bottlenecks, insufficient resource allocation, or task issuance timing and sequencing, you can set throttling rules. This ensures system stability and prioritizes resource allocation for critical data outputs.
Abnormal Alerts: Baseline operations allow for the configuration of alert rules for physical tasks and logical table fields. If an abnormality is detected, the system sends alerts via phone, text message, DingTalk, or email.

Overview of Features

Upon completing node development in Dataphin and submitting or publishing it to the production environment, you can manage tasks in the Operation Center. This includes data backfill for recurring tasks, executing one-time tasks, monitoring task status, configuring alerts, viewing instance and resource statistics, and setting policies for task timeouts or failures. Below is a description of the Operation Center's functional modules:

image..png

The following table outlines the use of the modules in the Operation Center:

Functional Module	Description
Operation Overview	Instance Statistics: Provides statistics on the running details, trends, and rankings of failed and alert instances for both offline and real-time instances. This helps you control the instance status from a project or global perspective. Abnormal Statistics: Offers statistics on abnormal nodes under global or selected projects, including errors and excessive running times, to support timely decision-making on budgeting, resource scale-out, or upgrades. Schedule Resource Dashboard: Provides insights into global node resource allocation, consumption, and optimization recommendations, aiding in resource scheduling and decision-making for budgeting, scale-out, or upgrades.
Task Operation	Task operation categorizes nodes as auto triggered, real-time, or one-time based on their scheduling. Auto triggered nodes include script nodes, detail and aggregate table nodes, and extraction nodes. This module allows for managing these nodes, viewing DAGs, instances, data backfill, and batch owner changes.
Instance Operation	Instance operation covers baseline, recurring, data backfill, one-time, and real-time instances. It provides management functions such as viewing DAGs, nodes, logs, and batch reruns.
Monitoring Management	Monitoring management includes baseline and offline task monitoring. Baseline Monitoring: Manages baseline monitoring, alerts, instances, and high-priority node assurance, including viewing DAGs, managing monitoring switches, transferring owners, and setting priority for resource allocation. Offline and Real-time Task Monitoring: Allows for configuring multiple alert rules for nodes, including field-level monitoring for offline logical table tasks, to monitor node dynamics and abnormalities, and baseline monitoring for key nodes. Note Only Basic and Prod projects support alert configuration. Throttling and baseline operations require separate purchase and activation.
System Configuration	System configuration includes throttling and runtime settings. Throttling Configuration: Configures rules to manage node issuance and resource allocation during performance bottlenecks, ensuring system stability and prioritized node execution. Runtime Configuration: Dataphin supports tenant-level runtime settings to manage instance timeouts and rerun policies based on tenant type and business scenarios, optimizing resource use and instance reliability.

Task Instance Generation Logic

The Operation Center manages various task types, including recurring, one-time, and real-time tasks, triggered by schedules or manually. Scheduling intervals range from minutes to years, and tasks can be triggered for data backfill, manual execution, or real-time operation.

Important

By default, tasks in the development environment do not run automatically; manual triggering is required.
Recurring tasks begin normal scheduling upon publication to the production environment.

image..png

Recurring Instance Generation Logic

Upon submitting or publishing a node with a schedule type of recurring task to the Operation Center, the task node appears in the Operation Center's recurring task list. Recurring tasks can produce two types of instances:

Instance Type

Instance Generation Time

Instance Running Logic

Instance Running Conditions

Recurring Instance

Auto triggered nodes automatically create instances for the following day at 23:00 each night, using the T+1 method for next-day generation.

Nodes submitted and published before 23:00 generate instances for the next day.
Nodes submitted and published after 23:00 generate instances on the third calendar day.
Note
Modifying the node's schedule resource group only affects new instances and not those already generated. To change an instance's schedule resource group, modify the node's resource configuration and publish before 23:00. Alternatively, modify the schedule resource for an instance that hasn't started running.

Recurring instances are scheduled and run automatically based on the node's scheduling properties after snapshot generation.

Recurring instances must meet the following conditions to start:

All dependent parent node instances have a successful running status.
The instance node has reached its scheduled running time.
The scheduled resources accommodate the operation of the instance.
The instance and its associated auto triggered nodes are not in a paused state. The operational status diagram for recurring instances is depicted below:

For more information on running status, please refer to Instance Running Diagnosis.

Data Backfill Instance

Data backfill instances are manually generated through data backfill operations on current auto triggered nodes.

Once generated, data backfill instances execute backfills based on the configured data timestamp.

Note

In the production environment, data backfill nodes can validate the normal operation of auto triggered nodes from the development environment and ensure data production.

One-time Instance Generation Logic
Upon submitting or publishing a node with a schedule type of one-time business to the Operation Center, it appears in the center's one-time task list. To execute the one-time task, select the Run command from this list. This manual trigger initiates a one-time instance, the execution details of which are accessible on the One-time Instance page.
Real-time Instance Generation Logic
Upon submission or publication of real-time tasks to the Operation Center, you can initiate and adjust resource configurations directly from the real-time task list in the Operation Center. In both Basic and Dev-Prod modes within the Prod environment, submitted real-time tasks will automatically create real-time instances, which initially have a Stopped status. Operations for real-time tasks are categorized into real-time computing and real-time integration tasks.

Operation Center Entry

Quick Entry (Recommended)

From the Dataphin home page, select Operation Scheduling to quickly access the Operation Center.

Regular Entry

On the Dataphin home page, click R&D in the top menu bar.
In the Data Development page, select O&M from the top menu bar to enter the Operation Center.