All Products
Search
Document Center

DataWorks:AI O&M

Last Updated:Mar 26, 2026

AI-powered O&M, built on DataWorks Copilot, diagnoses task instance issues across your entire project — from dependency failures and resource contention to log anomalies and data quality problems — and delivers a structured report with root cause analysis and one-click remediation actions.

image

How it works

When a task instance fails, runs slowly, or gets stuck waiting, AI-powered O&M analyzes the full task lifecycle to locate the root cause. It correlates data across dependency chains, resource usage, historical performance trends, change history, log content, and data quality — then generates a diagnostic report that tells you what went wrong, why, and what to do next.

Core capabilities:

  • Comprehensive diagnosis: Covers all task states — not running, waiting, running, and completed (success or failure). Analyzes individual instances, workflows, and entire projects. Supports contextual follow-up questions within the same session.

  • Root cause analysis: Correlates signals across multiple dimensions to pinpoint the actual root cause, going beyond surface-level error logs.

  • Interactive O&M: Execute remediation commands — such as Rerun, Set to Success, or Modify resource group — directly in the chat interface. Complex operations are simplified into one-click buttons.

Quick start

This walkthrough covers the full diagnostic cycle for a failed task instance: identifying the cause, reading the report, and executing a fix.

Step 1: Start a diagnosis

  1. Go to Operation Center > Cycle Task and locate the failed instance.

  2. Click the instance name to expand its DAG. Hover over the instance and click AI Diagnosis in the quick action bar.

    image

Step 2: Review the analysis

DataWorks Copilot opens on the right and displays "DataWorks Copilot is processing...". As it works, Copilot surfaces its analysis steps so you can follow the reasoning. Expand any step to see details.

<table> <thead> <tr> <td><p><img></p></td> <td><p><img></p></td> </tr> </thead> <colgroup></colgroup> <colgroup></colgroup> <tbody></tbody> </table>

Step 3: Read the diagnostic report

After a few seconds, Copilot returns a structured report with three sections:

  • Abnormal Findings: Identifies anomalies and deduces the root cause from available context.

  • Analysis Process: Details the evidence chain behind the AI's conclusions.

  • Solution and Prevention Suggestions: Provides specific steps to resolve the issue and long-term recommendations to prevent recurrence.

image

Step 4: Apply the fix

Follow the suggestions in the report to resolve the issue.

  • One-click actions: For common problems, the report surfaces direct actions. For a resource group issue, for example, it offers a shortcut to modify the task's resource group. Reply Yes to let Copilot guide you through the change. <table> <thead> <tr> <td><p><img></p></td> <td><p><img></p></td> </tr> </thead> <colgroup></colgroup> <colgroup></colgroup> <tbody></tbody> </table>

  • Natural language commands: If the report does not provide a direct action, enter a command in the chat — for example, "Modify the resource group for task xxx". Copilot handles the operation from there.

    image

The diagnostic report content and suggested solutions vary by failure cause. The information provided is for reference only. For a complete list of available remediation actions, see O&M actions.

Access AI diagnosis

AI-powered O&M is available from multiple locations in DataWorks.

Global entry point

On any page in DataWorks, open the Copilot chat in the upper-right corner, switch to Agent mode, and select /Data O&M.

image

Start a diagnosis with commands like Diagnose instance <Instance ID>, reference context with @<Instance ID>, or trigger a project-level analysis with a prompt.

At the global entry point, you must explicitly select the /Data O&M agent. Contextual entry points use the O&M agent by default.

Contextual entry points

Location How to access Best for
Operation Center > AI-powered O&M Click AI-powered O&M in the left navigation pane. Starting a fresh O&M session
Operation Center > Instance List In the Actions column, click More > AI Diagnosis. Scheduled, test, and data backfill instances
Operation Center > DAG Hover over a node instance and click AI Diagnosis. Diagnosing a specific node in context
Instance Running Logs tab On the Log Diagnosis page, click AI Diagnosis at the top. Diagnosing from a live log view
Log Diagnosis page Enable AI Diagnosis in the dialog, then enter an instance ID or project ID. Instance or project-level diagnosis by ID
The original Intelligent Diagnosis button has been renamed to Log Diagnosis and now focuses on analyzing the content of the current log.

Use cases

Instance-level diagnosis

Scenario Example command
Task failure Diagnose instance: <Instance ID> or @<Instance ID>
Slow runtime Why did instance <Instance ID> run slow today?
Long wait time Check why instance <Instance ID> is still waiting
Dependency blocking Show the failed parent nodes for instance <Instance ID>

O&M actions

From the diagnostic report or within the Copilot chat, you can perform the following actions on individual or multiple instances in your workspace.

Important

All actions triggered through the AI chat require your manual review and confirmation before they are executed.

You must have the Project Owner or O&M role in the target workspace to perform these actions.
Action Description
Rerun instance Reruns the current instance.
Set to Success Sets the instance's status to "Successful".
Suspend/Resume instance Pauses or resumes the scheduling state.
Modify resource group Switches the resource group assigned to the instance.
Modify priority Adjusts the scheduling priority (affects baseline scheduling).
Refresh instance Updates the instance with the latest task configuration.

Limits

  • Project-level diagnosis or analyses involving a large number of instances may take 1–5 minutes to complete.

  • Cross-workspace dependency analysis is supported, but viewing detailed results requires membership in the target workspace.