Agent - DataWorks - Alibaba Cloud Documentation Center

Agent leverages natural language interaction, combined with the deep cognitive and planning capabilities of large language models, to complete complex data integration, data development, and data governance tasks. It provides end-to-end automation from requirements to results, significantly improving work efficiency. This topic describes the key features, use cases, and core mechanisms of Agent.

Overview

Agent is built on a proprietary client. Unlike an Agent based on third-party clients, you can use it directly within the relevant DataWorks modules without installing extra software or performing complex configurations.

Through a 'describe your needs, get your results' natural language interaction, Agent delivers a requirements-as-code development experience. You can complete tasks such as data development simply by describing your requirements in natural language, which significantly boosts your productivity. The Agent workflow is as follows:

Access

Log on to the DataWorks console. In the left-side navigation pane, choose Data Development and O&M > DataStudio. Select your workspace and go to Data Studio.
On the Data Studio page, click the icon in the upper-right corner to open Data Agent Chat (Ask mode). In the lower-left corner of the dialog box, switch to Agent mode.

Quick start

Step 1: Switch to Agent mode

On the Data Studio page, click the icon in the upper-right corner to open Data Agent Chat (Ask mode). In the lower-left corner of the dialog box, switch to Agent mode.

Step 2: Select an Agent

You can click / or type / in the input box to quickly open the Agent menu and select the dedicated Agent for your current task. The Agent types include: Data Integration Agent, Data Map Agent, Data Development Agent, Data Governance Agent, and Data O&M Agent.

Note

In the corresponding product module, DataWorks automatically uses the matching Agent. You do not need to manually select one.

Step 3: Add context (optional)

You can type @ in the dialog box or click the @ icon in the lower-right corner to select and add the required context, which provides the Agent with richer background information.

The supported context types are:

Table: Reference the metadata of one or more tables.
Node/Code File: Reference the code within a specific node.
Data collection: Reference a data collection from Data Map.
Rules: Temporarily apply one or more quality rules to the current conversation.
Local File: Upload a local document to provide background information.

Step 4: Switch large language models (optional)

By default, Agent uses Auto mode. In this mode, the Agent performs intelligent model scheduling and automatic allocation based on the task scenario. It also supports seamless switching between multiple models. For more details, see 4. Intelligent model scheduling. You can also click the icon at the bottom of the dialog box to select from other supported large language models.

Step 5: Start the conversation

Enter your request in the dialog box. You can refine and clarify your intent through multiple turns of conversation, such as by asking follow-up questions or providing additional details, until the Agent fully understands your goal and produces the desired result.

Use cases

By leveraging the deep understanding and task orchestration capabilities of large language models, the Agent covers use cases across data integration, data development, data governance, Data Map, and data O&M. The following table compares these capabilities.

Agent scenario	Description
Data integration	You can describe data synchronization requirements in natural language, such as Chinese or English. The system automatically parses the semantics and generates the corresponding data synchronization task configuration. This includes the source and destination data sources, table structure mappings, field filtering conditions, partitioning strategies, and scheduling parameters.
Data development	Offers a natural language-based ETL development experience that covers the entire process from requirement analysis and code generation to workflow creation and deployment.
Data O&M	Offers comprehensive health assessments and issue diagnostics for task instances. By integrating multi-dimensional analysis including dependency chains, resource levels, historical run trends, change impacts, log anomalies, and data quality, it automatically generates structured diagnostic reports.
Data Map	Improves the efficiency of finding and understanding data. Through AI-driven natural language interaction, you can quickly explore metadata across various scenarios in massive datasets.
Data governance	The DataWorks Data Governance Agent helps enterprises transition from proactive to autonomous data governance. You can issue commands in natural language instead of performing complex data analysis and configuration. The Agent converts these commands into precise governance actions, applies expert-level configurations, and executes them automatically.

Use case 1 - Data Integration Agent

Description: You can describe data synchronization requirements in natural language, such as Chinese or English. The system automatically parses the semantics and generates the corresponding data synchronization task configuration. This includes the source and destination data sources, table structure mappings, field filtering conditions, partitioning strategies, and scheduling parameters.

Procedure:

In the dialog box, enter / and select Data Integration Agent.
Describe your data synchronization requirement, including the source, destination, table names, and synchronization method. For example: "Create an offline synchronization task to sync the MySQL table ods_user_info_d to the MaxCompute table ods_user_info_d."
The Agent parses your requirement and automatically populates information such as the data source and table mappings to create a data synchronization node.
After the node is created, you can review and modify it.

Use case 2 - Data Development Agent

Description: Offers a natural language-based ETL development experience that covers the entire process from requirement analysis and code generation to workflow creation and deployment.

Procedure:

Describe your data development requirement in natural language and add context as needed. For example: "Build a user profile analysis workflow."
The Agent breaks down the task into multiple steps, such as creating nodes, generating code, and configuring dependencies, and then executes them.
For the generated node code, you can review the changes and choose to keep or discard them.

Use case 3 - Data O&M Agent

Description: Offers comprehensive health assessments and issue diagnostics for task instances. By integrating multi-dimensional analysis including dependency chains, resource levels, historical run trends, change impacts, log anomalies, and data quality, it automatically generates structured diagnostic reports.

For more information about the Data O&M Agent, see AI-powered O&M.

Use case 4 - Data Map Agent

Description: Improves the efficiency of finding and understanding data. Through AI-driven natural language interaction, you can quickly explore metadata across various scenarios in massive datasets.

Core capabilities:

Natural language search: Lets you ask questions in natural language to quickly locate target data based on business intent without needing precise keywords. For example, "Find the summary tables related to user activity."
Automatic scope adjustment: You can specify a scope in the conversation, and the Agent will automatically understand the semantics and quickly locate data within that scope. For example, "In the adm_bi project, find tables related to business operations."
In-depth data understanding: You can ask follow-up questions about target data to quickly get details such as data lineage, owners, and field definitions. For example, "What are the direct downstream dependencies of the @dws_bi_metric_di table? Who would be affected by changes to it?"

Use case 5 - Data Governance Agent

Description: The DataWorks Data Governance Agent helps enterprises transition from proactive to autonomous data governance. You can issue commands in natural language instead of performing complex data analysis and configuration. The Agent converts these commands into precise governance actions, applies expert-level configurations, and executes them automatically.

Core capabilities:

Quality rule configuration: Helps you configure quality monitoring rules for specified key tables by using natural language. The Data Governance Agent analyzes the field types, business semantics, and importance of a specified table to automatically recommend and configure appropriate monitoring rules, such as primary key uniqueness, non-null constraints, and enum value range checks. This process eliminates the need for extensive data exploration and manual rule configuration.
- Example: Automatically generate quality rules for the dim_user_info core user dimension table.
- Example: For tables that start with ods_, automatically configure quality rules related to table row counts.
Quality issue governance: For quality issues automatically identified by the system in the data governance module, such as "Frequently accessed tables without quality rules" or "Tables produced by high-priority baseline tasks without quality rules," you can provide governance requirements in natural language. The system then automatically analyzes and remediates the issues.
- Example: Find frequently accessed tables that have no quality rules, then recommend and configure them.
- Example: Help me resolve issues under the quality dimension.

How it works

1. Storage management

The Data Development Agent supports creating nodes and files in either a project directory or your personal directory. To ensure proper storage management:

Storage Location Settings: In the Data Agent Settings Center, configure the default storage path for generating code files. For more information, see Personal Settings.
Conflict resolution mechanism: If the generated node type is inconsistent with the current directory's rules (for example, if you request to create a data integration node in your personal directory), the Agent triggers a confirmation prompt and waits for your verification before proceeding.

2. Complex task handling

For development requirements with complex logic, the Agent provides status feedback throughout the entire lifecycle:

To-do list: The Agent breaks down complex tasks into multiple sub-steps and displays them as a to-do list. As the execution progresses, the status of each item is automatically updated.
Execution summary: At the end of the workflow, the Agent compiles and outputs a summary report for the entire task. This report consolidates completed operations and generated resources, making the review process more efficient.

3. Token and performance statistics

After a task is completed, the Agent provides quantitative metrics to help you assess execution efficiency and the scale of model calls:

Task duration statistics: The system automatically records and displays the total time from the start to the end of the current task, allowing you to evaluate the efficiency of the automated process.
Token consumption: click to check the number of input tokens and output tokens generated during the interaction.

4. Intelligent model scheduling

The Agent's intelligent model allocation mechanism creates an intent-driven development experience, eliminating the need to select underlying models.

Automatic model allocation (Auto mode): By default, the Agent operates in Auto mode. In this mode, the Agent identifies and breaks down your development intent, and automatically dispatches the optimal model to handle different sub-tasks.
Dynamic multi-model coordination (Auto mode): In Auto mode, the Agent can orchestrate across different models. Based on the real-time needs of a task, the Agent flexibly switches between multiple models within a single conversation to ensure that each part of a complex task is matched with the most suitable model.
Manual model switching: While automation is available, you can switch from Auto mode and specify a different model for task processing for specific scenarios.

References

To learn about custom Agent features, see Agent based on third-party clients.