Operation Center provides the following modules: Overview, RealTime Task Maintenance, Cycle Task Maintenance, Manual Task Maintenance, Alarm, Intelligent Diagnosis, Resource, and Engine Maintenance. You can use these modules to perform O&M operations on nodes, engines, and resources.

Modules of Operation Center

In general, after you commit and deploy a node that is configured in DataStudio, you can perform O&M operations on the node in Operation Center. Operation Center provides multiple modules such as Overview and RealTime Task Maintenance. The following table describes the modules of Operation Center.
Module Description Supported environment
Overview This module displays the statuses of nodes in charts. For more information, see View the dashboard. This module is available only for the production environment of a DataWorks workspace in standard mode.
RealTime Task Maintenance This module allows you to run, stop, and disable real-time sync nodes, and configure alert rules for monitoring real-time sync nodes. For more information, see Manage real-time synchronization nodes. -
Cycle Task Maintenance This module displays the auto triggered nodes that are committed to the scheduling system and the auto triggered node instances that are generated after the nodes are run by the scheduling system. On the Cycle Task page, you can view all existing auto triggered nodes and perform operations such as changing the resource groups and owners of auto triggered nodes.

DataWorks generates instances for auto triggered nodes each evening based on the point in time at which the auto triggered nodes are committed. You can perform operations on auto triggered nodes to generate retroactive instances and test instances for the nodes. For more information, see View auto triggered nodes.

-
Manual Task Maintenance This module displays the manually triggered nodes or workflows that are committed to the scheduling system and the manually triggered instances that are generated after the nodes or workflows are manually triggered. On the Manual Task page, you can view all existing manually triggered nodes or workflows and perform operations such as changing the resource groups and owners of manually triggered nodes or workflows.

In the upper part of the Manual Task page, you can set the Type parameter to Manually Triggered Workflow to view manually triggered workflows. You can trigger the manually triggered workflows to run and generate instances for the manually triggered workflows. Then, you can view the execution details of the manually triggered workflows on the Manual Instance page. For more information, see Manually triggered nodes.

-
Alarm This module allows you to configure alert rules for monitoring auto triggered nodes. This module monitors the statuses of auto triggered node instances and the usage of exclusive resource groups in automatic scheduling scenarios.

This module also allows you to configure alert rules for monitoring a specific object such as a node, workflow, workspace, baseline, real-time compute node, exclusive resource group for scheduling, or exclusive resource group for Data Integration. In addition, this module allows you to configure alert rules for monitoring the baselines of the global line-of-business and sends alert notifications based on the notification method you specify. The notification methods include SMS messages, emails, phone calls, and DingTalk chatbots. For more information, see Overview.

This module is available only for the production environment of a DataWorks workspace in standard mode.
Resource This module monitors the usage of exclusive resource groups for scheduling and exclusive resource groups for Data Integration and automatically performs O&M operations. You can view the usage of a resource group and the list of nodes that are using or waiting to use the resource group. For more information, see Use the resource O&M feature. -
Engine Maintenance This module is available only for E-MapReduce (EMR) compute engines. A DataWorks node instance that runs on an EMR compute engine contains multiple EMR jobs. These EMR jobs are run based on a specific sequence. You can use the Engine Maintenance module of DataWorks to view the details of each EMR job and identify and remove the jobs that fail to be run. This prevents failed jobs from affecting the execution of both the DataWorks node instance to which the jobs belong and the descendant nodes of the node that generates the node instance. For more information, see Use the engine O&M feature. This module is available only for the production environment of a DataWorks workspace in standard mode.
Intelligent Diagnosis This module helps you track the execution of nodes and identify problems. For more information, see Instance diagnosis. This module is available only for the production environment of a DataWorks workspace in standard mode.
When you use Operation Center of DataWorks, take note of the following items:
  • In a DataWorks workspace in standard mode, you can click the switch icon next to Operation Center in the top navigation bar to switch between the development and production environments.
  • Only Operation Center of the production environment supports automatic scheduling. You can view auto triggered node instances only on the Cycle Instance page in Operation Center of the production environment.

Logic for running nodes

Before you run a node that is committed and deployed to Operation Center, an instance must be generated for the node. Then, DataWorks runs the node based on the trigger mechanism that is configured.

For example, an auto triggered node is run based on the following logic:
  • After the auto triggered node is committed to Operation Center, DataWorks generates an instance each night for running the node the following day. Then, the generated instance is automatically triggered to run based on your scheduling configuration when the trigger conditions are met. The scheduling configuration includes node dependencies, the point in time when the node is automatically triggered to run, and the resources for running the node.
  • You can manually generate retroactive data for the auto triggered node to generate a retroactive instance for the node. The retroactive instance can be run to obtain the retroactive data of the auto triggered node in the specified time range in the past. Alternatively, you can manually test an auto triggered node to generate a test instance for the node.
    Note To sum up, auto triggered node instances are automatically generated for auto triggered nodes and automatically run based on the scheduling configuration when the trigger conditions are met. Test instances or retroactive instances are generated after you manually trigger DataWorks to generate retroactive data for or test auto triggered nodes. Therefore, test instances and retroactive instances are generated based on the latest node configurations.
  • On the Cycle Instance, Patch Data, or Test Instance page under Cycle Task Maintenance, you can view the statuses of auto triggered node instances, retroactive instances, or test instances. You can determine whether the data output of an instance is normal based on the status of the instance. For more information about instance statuses, see Related information: Instance status.
Instance type Scenario How to generate the instance based on an auto triggered node

How to trigger the instance to run

Prerequisite for running the corresponding node
Auto triggered node instance You want to perform periodic extract, transform, load (ETL) operations. DataWorks automatically generates an auto triggered node instance based on the snapshot information of the auto triggered node at a specific point in time.
Note If a DataWorks workspace in standard mode is used, an auto triggered node instance can be automatically generated and run only in the production environment.
DataWorks automatically triggers an auto triggered node instance to run. The following prerequisites must be met before an auto triggered node starts to run:
  • The parent node of this node is successfully run.
  • The scheduled point in time for running the node is reached.
  • Sufficient scheduling resources are provided for running the node.
  • This node is not frozen.
Retroactive instance
  • You want to generate retroactive data for the current node and its descendant nodes for a specific time range in the past. In other words, you want to perform ETL operations on historical data.
  • You want to generate retroactive data for the current node and its descendant nodes for a specific time range in the future. In other words, you want to perform ETL operations on future data in advance.
You need to manually trigger DataWorks to generate retroactive data for the auto triggered node and generate a retroactive instance for the node. After you manually trigger DataWorks to generate retroactive data for the auto triggered node, a retroactive instance is generated and triggered to run.
Test instance You want to test the current auto triggered node to check whether the node can be run as expected.
Note When you run a test instance, the code logic of the test instance is implemented.
You need to manually trigger DataWorks to test the auto triggered node and generate a test instance for the node. After you manually trigger DataWorks to test the auto triggered node, a test instance is generated and triggered to run.

After an auto triggered node is deployed in the production environment, you can view the node on the Cycle Task page. However, when an instance is generated for the auto triggered node depends on the method that you use to generate the instance. For more information, see Configure time properties.

Node O&M: Intelligent monitoring

You can configure alert rules for monitoring auto triggered node instances and exclusive resource groups.

  • Custom alert rules

    You can configure a custom alert rule for monitoring the specified object.

    For example, you can specify that an alert notification is sent if a node of the specified object such as a baseline, workspace, or workflow is in one of the following states: Completed, Uncompleted, Error, Uncompleted in Cycle, Overtime, and The error persists after the node automatically reruns. You can specify that an alert notification is sent if an error occurs when a real-time compute node runs. You can also configure a custom alert rule for monitoring an exclusive resource group. For example, you can specify that an alert notification is sent if the resource usage or the number of node instances that are waiting for resources exceeds a specific threshold. In addition, you can use the automated O&M feature of intelligent monitoring to send notifications to the specified alert contact such as the node owner, specified responsible person, or on-duty engineer that you specify for the shift schedule. Notifications can be sent by using SMS messages, emails, phone calls, or DingTalk chatbots. For more information, see Manage custom alert rules.

  • Built-in global alert rules
    You can monitor special events or implement global monitoring by using built-in global alert rules, including alert rules for node isolation, node loops, global events, and global baselines.
    • Alert rule for node isolation

      Isolated nodes are nodes that do not have upstream dependencies. You can view the dependencies of an auto triggered node on the Cycle Task or Cycle Instance page. When DataWorks schedules auto triggered nodes to run, isolated nodes are excluded from the scheduling. Therefore, isolated nodes cannot be automatically triggered to run. If an isolated node has a large number of descendant nodes, the descendant nodes may fail to be run. After an isolated node is generated, an alert notification is automatically sent. Handle the issue at the earliest opportunity if you receive an alert notification that indicates an isolated node.

    • Alert rule for node loops

      If a node serves as both the ancestor node and descendant node of another node, a node loop is formed. As a result, a dependency loop is formed. The nodes in a node loop cannot be automatically scheduled to run. After a node loop is formed, an alert notification is automatically sent. Handle the issue at the earliest opportunity if you receive an alert notification that indicates a node loop.

    • Alert rule for global events

      You can configure alert rules for monitoring events that may affect the execution of nodes in the baselines of critical concern. You can specify the following information for a specific baseline: the maximum number of alerts to be handled, minimum intervals at which alert notifications are to be sent, notification methods, and alert contacts.

    • Alert rule for global baselines

      You can specify the following information for a baseline of critical concern: the maximum number of alerts to be handled, minimum intervals at which alert notifications are to be sent, notification methods, and alert contacts.

  • Baseline management

    You can use the baseline management feature to dynamically monitor lines-of-business and baselines.

    You can add the nodes of critical concern to a baseline. After the baseline is enabled, the nodes in the baseline are monitored. If the data output of the nodes in the baseline is affected, an alert is generated. For example, if an error is reported or the speed for running the nodes slows down, the data output is affected. Then, DataWorks calculates based on the historical duration required for running these nodes to learn whether the data output of the current day can be generated and generates alerts in advance. For more information, see Manage baselines.

  • Automated O&M

    You can use the automated O&M feature to manage exclusive resource groups based on the configured custom alert rules for the resource groups when the specified conditions are met. For example, you can terminate the execution of node instances. For more information, see Automated O&M.

Data O&M: Data quality

Data Quality of DataWorks can monitor the table data generated by auto triggered node instances, retroactive instances, and test instances.

You can configure a data quality rule for a table generated by a node. Then, Data Quality matches the partitions in the table by using the partition expression configured for the table. If the node that is associated with the data quality rule is run, the data quality rule is triggered. You can specify the strength of the rule to determine whether a node exits when an error occurs. This limits the flow of dirty data. You can also subscribe to rules to receive corresponding alert notifications at the earliest opportunity. For more information, see Overview.

Related information: Instance status

As mentioned in the Logic for running nodes section in this topic, the following conditions must be met before a node starts to run:
  • The parent node of this node is successfully run.
  • The scheduled point in time for running the node is reached.
  • Sufficient scheduling resources are provided for running the node.
  • This node is not frozen.

Two of the preceding conditions are related to node statuses. A node instance status model defines the six states of a node throughout the time to live (TTL). The following figure shows the logic of conversion between the states.

No. State Icon Conversion logic
1 Succeeded Succeeded Conversion logic
2 Not running Not running
3 Failed Failed
4 Running Running
5 Waiting Waiting
6 Suspended or frozen Suspended or frozen