AI-powered O&M, built on DataWorks Copilot, diagnoses task instance issues across your entire project — from dependency failures and resource contention to log anomalies and data quality problems — and delivers a structured report with root cause analysis and one-click remediation actions.
How it works
When a task instance fails, runs slowly, or gets stuck waiting, AI-powered O&M analyzes the full task lifecycle to locate the root cause. It correlates data across dependency chains, resource usage, historical performance trends, change history, log content, and data quality — then generates a diagnostic report that tells you what went wrong, why, and what to do next.
Core capabilities:
-
Comprehensive diagnosis: Covers all task states — not running, waiting, running, and completed (success or failure). Analyzes individual instances, workflows, and entire projects. Supports contextual follow-up questions within the same session.
-
Root cause analysis: Correlates signals across multiple dimensions to pinpoint the actual root cause, going beyond surface-level error logs.
-
Interactive O&M: Execute remediation commands — such as Rerun, Set to Success, or Modify resource group — directly in the chat interface. Complex operations are simplified into one-click buttons.
Quick start
This walkthrough covers the full diagnostic cycle for a failed task instance: identifying the cause, reading the report, and executing a fix.
Step 1: Start a diagnosis
-
Go to Operation Center > Cycle Task and locate the failed instance.
-
Click the instance name to expand its DAG. Hover over the instance and click AI Diagnosis in the quick action bar.

Step 2: Review the analysis
DataWorks Copilot opens on the right and displays "DataWorks Copilot is processing...". As it works, Copilot surfaces its analysis steps so you can follow the reasoning. Expand any step to see details.
<table> <thead> <tr> <td><p><img></p></td> <td><p><img></p></td> </tr> </thead> <colgroup></colgroup> <colgroup></colgroup> <tbody></tbody> </table>
Step 3: Read the diagnostic report
After a few seconds, Copilot returns a structured report with three sections:
-
Abnormal Findings: Identifies anomalies and deduces the root cause from available context.
-
Analysis Process: Details the evidence chain behind the AI's conclusions.
-
Solution and Prevention Suggestions: Provides specific steps to resolve the issue and long-term recommendations to prevent recurrence.
Step 4: Apply the fix
Follow the suggestions in the report to resolve the issue.
-
One-click actions: For common problems, the report surfaces direct actions. For a resource group issue, for example, it offers a shortcut to modify the task's resource group. Reply
Yesto let Copilot guide you through the change. <table> <thead> <tr> <td><p><img></p></td> <td><p><img></p></td> </tr> </thead> <colgroup></colgroup> <colgroup></colgroup> <tbody></tbody> </table> -
Natural language commands: If the report does not provide a direct action, enter a command in the chat — for example, "Modify the resource group for task xxx". Copilot handles the operation from there.

The diagnostic report content and suggested solutions vary by failure cause. The information provided is for reference only. For a complete list of available remediation actions, see O&M actions.
Access AI diagnosis
AI-powered O&M is available from multiple locations in DataWorks.
Global entry point
On any page in DataWorks, open the Copilot chat in the upper-right corner, switch to Agent mode, and select /Data O&M.
Start a diagnosis with commands like Diagnose instance <Instance ID>, reference context with @<Instance ID>, or trigger a project-level analysis with a prompt.
At the global entry point, you must explicitly select the /Data O&M agent. Contextual entry points use the O&M agent by default.
Contextual entry points
| Location | How to access | Best for |
|---|---|---|
| Operation Center > AI-powered O&M | Click AI-powered O&M in the left navigation pane. | Starting a fresh O&M session |
| Operation Center > Instance List | In the Actions column, click More > AI Diagnosis. | Scheduled, test, and data backfill instances |
| Operation Center > DAG | Hover over a node instance and click AI Diagnosis. | Diagnosing a specific node in context |
| Instance Running Logs tab | On the Log Diagnosis page, click AI Diagnosis at the top. | Diagnosing from a live log view |
| Log Diagnosis page | Enable AI Diagnosis in the dialog, then enter an instance ID or project ID. | Instance or project-level diagnosis by ID |
The original Intelligent Diagnosis button has been renamed to Log Diagnosis and now focuses on analyzing the content of the current log.
Use cases
Instance-level diagnosis
| Scenario | Example command |
|---|---|
| Task failure | Diagnose instance: <Instance ID> or @<Instance ID> |
| Slow runtime | Why did instance <Instance ID> run slow today? |
| Long wait time | Check why instance <Instance ID> is still waiting |
| Dependency blocking | Show the failed parent nodes for instance <Instance ID> |
O&M actions
From the diagnostic report or within the Copilot chat, you can perform the following actions on individual or multiple instances in your workspace.
All actions triggered through the AI chat require your manual review and confirmation before they are executed.
You must have the Project Owner or O&M role in the target workspace to perform these actions.
| Action | Description |
|---|---|
| Rerun instance | Reruns the current instance. |
| Set to Success | Sets the instance's status to "Successful". |
| Suspend/Resume instance | Pauses or resumes the scheduling state. |
| Modify resource group | Switches the resource group assigned to the instance. |
| Modify priority | Adjusts the scheduling priority (affects baseline scheduling). |
| Refresh instance | Updates the instance with the latest task configuration. |
Limits
-
Project-level diagnosis or analyses involving a large number of instances may take 1–5 minutes to complete.
-
Cross-workspace dependency analysis is supported, but viewing detailed results requires membership in the target workspace.