All Products
Search
Document Center

DataWorks:View the statistics on the Overview page

Last Updated:Dec 25, 2023

The Overview page in Operation Center displays the overall O&M information, including the results of O&M stability assessment, key O&M metrics, usage of scheduling resources, and status information about auto triggered nodes. This page also displays information about synchronization nodes in Data Integration. This helps you quickly understand the overall information about nodes in your workspace, identify and handle exceptions at the earliest opportunity, and improve O&M efficiency.

Usage notes

The Overview page allows you to view the overall O&M information about workspaces and the O&M information about synchronization nodes in Data Integration of workspaces from the following perspectives:

  • Specified workspace: You can view O&M information about a specified workspace, including the overall O&M information about the workspace and the O&M information about synchronization nodes in Data Integration of the workspace.

  • All workspaces: You can view the overall O&M information about all workspaces within your current account. From this perspective, you cannot view the O&M information about synchronization nodes in Data Integration.

Limits

  • In a workspace in standard mode, you cannot use the Overview module in Operation Center in the development environment. In a workspace in standard mode, you can switch between the production environment and development environment in the top navigation bar in Operation Center.

  • The Workbench Overview tab displays only the statistics on O&M information about auto triggered nodes and auto triggered node instances.

Go to the Overview page

Log on to the DataWorks console. In the left-side navigation pane, choose Data Modeling and Development > Operation Center. On the page that appears, select the desired workspace from the drop-down list and click Go to Operation Center.

View the statistics on the Workbench Overview tab

The Workbench Overview tab displays the O&M information about your workspace in different dimensions, including overall O&M stability, O&M issues, running details of auto triggered nodes and auto triggered node instances, usage of resources in resource groups, ranking of instances in different states, and ranking of instances with errors.

View information in the O&M Stability Assessment section

In the O&M Stability Assessment section, the O&M stability of your workspace is assessed based on the overall running details of nodes in your workspace. The health status for O&M stability can be excellent, good, medium, or poor. If high-risk or low-risk items are displayed, the health status of the workspace is poor. You must handle the risky items and optimize the performance of the workspace at the earliest opportunity. You can select All Project in the top navigation bar to view the following information about all workspaces: stability assessment result, number of auto triggered node instances, and completion rate of auto triggered node instances.整体运维状态评估

View information in the Focus On section

The Focus On section displays the O&M exceptions from the workspace and individual perspectives based on exception statistics of intelligent baselines and auto triggered nodes. You can view the overall information in your workspace or view only the information about nodes of which you are the owner to identify and handle exceptions at the earliest opportunity and ensure that your business is not affected. 异常问题The following table describes common exception types.

Exception type

Description

References

Baseline in Overtime

Counts the number of baseline instances that are in the overtime state on the current day. If a node in a baseline is still running when the committed completion time of the baseline arrives, an instance that is generated for the node enters the overtime state.

Manage baseline instances

Baseline in Alert

Counts the number of baseline instances that are in the alert state on the current day. You can specify an alert margin threshold to ensure that important data is generated as expected in scenarios in which dependencies between nodes in the baseline are complex. If the alert margin threshold is exceeded, nodes may fail to finish running as expected and exceptions may occur.

Configure an appropriate committed point in time and an appropriate alert margin threshold for a baseline

Error-related Events

Counts the number of error-related events that are generated on the current day. An error-related event is generated if a node in a baseline fails. In this case, the running of descendant nodes of the node may be blocked. You must handle the error at the earliest opportunity to prevent the node from affecting the running of its descendant nodes.

Manage events

Slowdown Events

Counts the number of slowdown events that are generated on the current day. A slowdown event is generated if the running duration of a node in a baseline is significantly longer than the average running duration of the node in the historical periods of time.

Isolated Nodes

Counts the number of isolated nodes on the current day. If an auto triggered node does not have an ancestor node, the auto triggered node becomes an isolated node. In this case, the node cannot be automatically scheduled to run.

Scenario: Isolated node

Frozen Nodes

Counts the number of auto triggered nodes that are frozen on the current day. If an auto triggered node is frozen, instances that are generated for the node are also frozen. Frozen instances are not automatically scheduled, and the descendant instances of the frozen instances are blocked from running.

Node freezing and unfreezing

Expired Nodes

Counts the number of auto triggered nodes for which the effective period of scheduling expires. The system generates instances for an auto triggered node and runs the instances within the effective period of scheduling of the node. If the effective period of scheduling expires, the system does not generate or schedule auto triggered node instances of the node.

None

Modified Nodes

Counts the number of auto triggered nodes whose configurations are modified on the current day.

  • The modifications include code change, scheduling configuration modification, node status change, and node ownership change.

  • The statistics on the following nodes are collected: nodes whose configurations are modified on the DataStudio page and that are deployed to the production environment after configuration modification and nodes whose configurations are modified in the production environment.

Note

If you select Mine in the upper-right corner of the Focus On section, only the number of modified nodes of which you are the owner is counted.

None

View O&M information about auto triggered nodes and auto triggered node instances

The following table lists the sections in which you can view O&M information about auto triggered nodes and auto triggered node instances.

Section

Description

Illustration

Distribution of Instances by Status

This section displays the statistics on the distribution of auto triggered node instances by status based on a specific data timestamp. You can view the distribution of auto triggered node instances in the current workspace or the distribution of auto triggered node instances of which you are the owner. The statistics in this section are updated when you load the page. You can click a sector in the donut chart to view the number and proportion of auto triggered node instances in a specific state.

Take note of the auto triggered node instances in the following states, which may affect your business:

  • Failed: An auto triggered node instance in this state fails to run. As a result, the running of its descendant instances may be blocked.

  • Frozen: A frozen auto triggered node instance is not automatically scheduled, and the running of its descendant instances is blocked.

  • Slow running: An auto triggered node instance is considered a slow running instance if the auto triggered node instance is running for at least 15 minutes longer than the average running duration of the historical auto triggered node instances during the last 10 days. If the number of historical auto triggered node instances is less than four and the running duration of an auto triggered node instance exceeds 30 minutes, the auto triggered node instance is considered a slow running instance.

Note

Only statistics on normal nodes are collected. Statistics on dry-run nodes and frozen nodes are not collected.

实例运行状态分布

Completion Status of Instances

This section displays the completion status of auto triggered node instances between 00:00 and 23:00 of the current day. You can view the number of auto triggered node instances that are successfully run or not run on the current day and previous day. You can also view the historical average number of auto triggered node instances that are successfully run or not run. The line chart displays the number of auto triggered node instances that are successfully run on the current day, the number of auto triggered node instances that were successfully run on the previous day, and the historical average number of auto triggered node instances that are successfully run. If the deviations among the three lines are large, an exception occurred during a specific period of time. You must perform a further check and analysis. You can select a node type to view the completion status of specific auto triggered node instances.

Note

The Historical Average metric presents the completion status of auto triggered node instances that are successfully run in the previous 10 days.

周期实例完成情况

Trend of Nodes and Instances

This section displays the trends in the numbers of auto triggered nodes and auto triggered node instances in the production environment within a specific period of time. You can specify a period of time within the previous 12 months in the upper-right corner.

Note

The time is selected based on the data timestamp. If you want to view the completion status of auto triggered nodes or auto triggered node instances on the current day, you must set the time to the previous day.

周期实例与周期任务趋势

Distribution of Auto Triggered Nodes

This section displays the number and proportion of auto triggered nodes counted by node type, priority, owner, and scheduling cycle. The statistics in this section are updated when you load the page. The number of legend items of the donut chart is limited. If the number of legend items exceeds the upper limit, the excess legend items are merged into one.

Note

If you select All Project in the top navigation bar of the Operation Center page, you can view the distribution of auto triggered nodes by workspace in this section.

任务分布情况

View information in the Resource Usage in Resource Group for Scheduling section

This section displays the resource usage of a resource group for scheduling and the trend in the number of instances that are run on the resource group over a specific period of time. The Resource Usage line shows the percentage of the resources used by the node instances that are run on the specified resource group. If the resource usage of a resource group exceeds 80%, we recommend that you scale out the resource group to prevent insufficient resources from affecting the running of nodes.

Note
  • This section displays statistics for a maximum of seven days.

  • The Resource Usage and Instances metrics apply to resource groups. For example, if multiple workspaces share the exclusive resource group for scheduling that you use, this section displays the resource usage and the trend in the number of instances that are run on the resource group in all the workspaces.

调度资源组使用情况

View the ranking of auto triggered node instances on the previous day and the ranking of auto triggered node instances with the highest error rate in the recent month

实例运行及出错排行

  • Ranking of Instances on Previous Day

    This section ranks auto triggered node instances based on their running duration, time spent in waiting for resources, and slow running duration on the previous day. Only the top 30 auto triggered node instances are displayed. You can identify a time-consuming node based on the ranking and click the ID of the instance that is generated for the node to go to the instance details page. You can also go to the Intelligent Diagnosis page to view more information about the instance.

    Note

    Slow Running: The difference between the running duration of an instance on the previous day and the average running duration of historical instances is collected. Instances are sorted by the difference in descending order.

  • Ranking of Auto Triggered Node Instances with Highest Error Rate in Recent Month

    This section ranks nodes on which errors occurred within the recent month and displays the top 30 nodes. You can identify a node with a high error rate in the recent month, view the running details of the node, and then identify the cause of the error.

View the statistics on nodes on the Data Integration tab

On the Data Integration tab, you can view the information about synchronization nodes and resource usage of resource groups on the previous day or on the current day.

View resource usage of exclusive resource groups for Data Integration

The Status of Exclusive Resource Group for Data Integration section displays the details of resource usage of all exclusive resource groups for Data Integration in the current workspace. The details include the number of nodes that are run on the resource groups, resource usage, and expiration time. You can determine whether to perform operations such as scaling a resource group based on the resource usage of the resource group and the number of nodes that are run on the resource group. 独享数据集成资源组使用情况

Note
  • For information about the operations that you can perform on an exclusive resource group for Data Integration, see Exclusive resource groups for Data Integration.

  • The Data Integration tab on the Overview page collects O&M statistics only on exclusive resource groups for Data Integration. For information about O&M operations that you can perform on the shared resource group for Data Integration, see Use a shared resource group.

View the distribution of synchronization nodes by status

The Running Status Distribution section displays the distribution of synchronization nodes by status in the current workspace in a donut chart. You can click a sector to go to the details page of the nodes in a specific state. On the details page, you can view details of the nodes and handle exceptions that occur on the nodes. You must take note of the nodes that are in the Exception and Abnormal states. The nodes in these states block the running of their descendant nodes.运行状态分布

View the statistics on batch synchronization nodes

The following table lists the sections in which you can view the statistics on batch synchronization nodes.

Section

Description

Illustration

Data Synchronization Progress

This section displays information about the data that is involved in batch synchronization within a specified period of time. The metrics include Total data volume, Total public network traffic, and Total records.

数据同步速度

Statistics on Amount of Synchronized Data

This section displays the curves of the data that is read from or written to different data sources within a specified period of time. In this section, you can view the nodes of a specific type of compute engine that are run to synchronize a large amount of data. You can allocate an excess of scheduling resources for the nodes.

离线数据同步任务数据统计量

Latest Top 10 Tasks

This section displays the latest 10 node instances that failed to run and the latest 10 node instances that are successfully run. The statistics provide you with an overview of the latest node instance status. You can quickly identify the cause of an instance failure and fix the error based on the error message.

离线任务同步榜单

Running Details of Synchronization Task

This section allows you to specify filter conditions to search for nodes. The filter conditions include Submission time, Task Status, and Node Name. You can click the ID of a node to view the details of the node.

离线同步任务详情

View the statistics on real-time synchronization nodes

The following table lists the sections in which you can view the statistics on real-time synchronization nodes.

Section

Description

Illustration

Overview

This section displays the total data transmission speed and total recording speed of all real-time synchronization nodes in the current workspace.

同步速度

Top 10 Tasks with Highest Latency

This section displays the top 10 nodes that have the highest latency. In this section, you can quickly identify nodes that have high latency and optimize the performance of the nodes at the earliest opportunity.

任务延迟

Alert Information

This section displays information about the latest alerts. This section allows you to quickly identify exceptions and handle the exceptions at the earliest opportunity.

报警信息

Failover Information

This section displays information about failovers within a specified period of time. This section provides you with an overview of failovers. For more information, see Manage real-time synchronization nodes.

failover