Author: Wang Chen
Recently, two articles have gained significant attention1, both coincidentally mentioning that the engineering aspect of AI applications has been underestimated as AI has developed to the present.
However, the term engineering is a very generalized technical term, encompassing a wide range of content. Broadly speaking, non-algorithmic technical implementations and product designs can all be categorized as engineering. This article temporarily classifies engineering into product engineering and technical engineering, attempting to simplify the construction of an AI Agent's engineering system through this perspective.
Engineering = Product Engineering + Technical Engineering
The collaboration between these two components determines whether an AI Agent is "usable, easy to use, and scalable."
Product engineering focuses on the overall considerations of product philosophy, product business, interaction design, and user experience, ensuring that AI is no longer just a "black box," but that it can be perceptive, guiding, and feedback-rich, with a self-correcting mechanism. We will first deconstruct product engineering and then focus on key modules to elaborate on their role in achieving a successful AI Agent.
| Module | Definition |
|---|---|
| Demand Modeling | Clarify who the AI application serves, what problems it can solve, and avoid "using AI for the sake of using AI." |
| UI/UX Design | Transform the complex behaviors of AI into interfaces and processes that users can understand and operate. |
| Human-Machine Interaction Process | Allow the AI to "ask questions" and "confirm decisions," completing tasks rhythmically like an assistant. |
| Prompt Engineering | Make good use of prompts like a "magic wand" to enhance the quality and consistency of AI outputs. |
| Feedback Loop | Enable users to provide feedback on results, allowing the system to learn to improve or signal failures. |
| Permissions and Compliance | Control who can use what data and prevent AI abuse or data leaks. |
The first step in building an AI Agent is not to choose the model, but rather to answer a question like a product manager: "Who is this AI supposed to help, how will it solve problems, what problems can it address, to what extent can these problems be solved, and is the user willing to pay for this value?" This defines the fit between the product and the market.
Take Manus as an example[3]. It is the world's first general-purpose AI entity, with the core idea being "a combination of hand and brain," emphasizing the shift of AI from a passive tool to an active collaborator.
Example: A user inputs "7-day budget of 20,000 yuan for a trip to Thailand," and Manus automatically completes currency conversion, hotel comparisons, itinerary planning, and exports a PDF manual.
This role definition requires clearly delineating AI's responsibilities and behavioral boundaries in the system prompts, ensuring that it possesses autonomy and reliability in executing tasks.
Example: A user requests an analysis of stock xx, and Manus automatically retrieves relevant data, carries out financial modeling, generates an interactive dashboard, and deploys it as an accessible website.
In demand modeling, user requirements need to be broken down into multiple sub-tasks, and corresponding execution processes and tool invocation strategies must be designed to ensure smooth task closure.
In the demand modeling phase, the responsibilities and interactions of each layer need to be defined to ensure the coordinated operation of the entire system.
Example: During the task execution process in Manus, users can close devices or add instructions at any time; Manus will adjust the task execution process according to new instructions.
This design requires considering the flexibility and controllability of human-machine interaction in demand modeling, ensuring that users have sufficient control during collaboration.
This approach to demand modeling is similar to segmenting user task processes and identifying the areas where AI excels and should intervene, avoiding the "broad and vague" approach while allowing users to experience efficiency improvements in real-time.
Just as designing service boundaries for microservices entails defining which responsibilities lie with the order unit and which with the user inventory unit, AI applications also need to specify which tasks are handled by AI and which fall under business logic; these distinctions will directly determine the final user experience.
For example, DeepSeek was the first to visualize the "thought processes" of a large model; before generating a response, it displays the model's thought chain, allowing users to see that the AI is not guessing randomly but is logically thinking. Users are no longer passively receiving results; instead, they are participating in the thought process, establishing a collaborative relationship in problem-solving.
This design effectively enhances users' trust and acceptance, especially in scenarios involving multi-step tasks, complex document summaries, and cross-reference of information.
Currently, interaction strategies such as "progressive information presentation," "visualization of thought processes," and "structured results" have become standard for AI Agents. Users can view call chains, track reference sources, and even Qwen provides options to delete reference sources, further reducing hallucinations caused by untrustworthy internet sources.
For example, NotebookLM's core concept is: users upload their own materials, and the AI assistant responds to questions and provides suggestions based on this material, acting as a "trusted knowledge advisor," empowering users to engage in more efficient and intelligent learning and research activities.
This product positioning indicates that the system prompts behind NotebookLM must not only fulfill basic language guidance but also implement more complex task instructions and safety constraints. We can break down its system prompts design approach from several key dimensions:
prompt
You are a research assistant, and your responsibility is to help users understand the content of the documents they upload. When answering questions, only reference the provided materials without making subjective inferences.
This role definition not only controls the output range of the model but also makes it easier for users to trust the source of the responses psychologically.
prompt
For each question, cite the relevant sections of the document, and list their titles and paragraph indices in markdown format.
This allows users to trace back and verify sources while reading model responses, creating a positive traceable experience. Trust in AI output quality often comes from this "source-citing" design.
prompt
Please extract the five most important points from the provided document and list them consecutively, maintaining an objective and neutral language style without providing further explanations.
Task-oriented prompt arrangement has transcended traditional "conversational question-and-answer" formats, resembling a form of "task script," laying the foundational capabilities for multimodal AI applications.
For instance, Monica's memory function allows users browsing recommended memory entries to click "adopt as fact". These items will then be written into the memory database for use in subsequent conversations, while unadopted items will not be forcibly memorized. This "feedback → selective absorption → next-round tuning" mechanism enhances the traditional prompt + chat model.
Monica continuously learns from user characteristics to improve the accuracy of need understanding and response correctness. Essentially, it rebuilds context awareness in conversational interactions. Just like interpersonal communication, the longer two people interact, the more familiar they become with each other, enabling better understanding of each other's expressions.
Technical engineering validates product engineering. Just like the fast fish eating the slow fish in the internet era, the efficiency of technical engineering in the AI era is crucial to quickly running through product engineering, validating the market, and rapidly iterating. Technical engineering serves as the logistical system that supports AI applications, encompassing architecture and modularity, tool invocation mechanisms, model and service integration, traffic and access control, data management and structured outputs, safety and isolation mechanisms, and DevOps along with observability.
| Module | Definition |
|---|---|
| Architecture and Modularity | Break down the AI application into small modules, where each component has clear responsibilities, allowing for easier combination and maintenance. |
| Tool Invocation Mechanism | Enable the AI to invoke databases, check weather, place orders, etc., to truly "get things done". |
| Model and Service Integration | Integrate multiple models (DeepSeek, Qwen, local large models, etc.) for unified invocation and management. |
| Traffic and Access Control | Control usage frequency and access permissions for different users and models to prevent misuse or crashes. |
| Data Management and Structured Output | Convert the AI's free text into structured data, allowing the system to use it directly or store it in a database. |
| Safety and Isolation Mechanisms | Prevent data cross-use and unauthorized operations, which is particularly critical in multi-tenant or enterprise applications. |
| DevOps and Observability | Support gray releases, feature rollbacks, performance alarms, log what happened during each invocation, enable problem identification, and provide metrics for optimization, ensuring continuous and stable operation. |
Aside from Python-centered ecosystems like LangChain and LangGraph, Spring AI Alibaba provides native, enterprise-level AI application orchestration capabilities for Java communities, thus becoming a significant tool for building modular AI applications. Its core features, in addition to the eight foundational capabilities mentioned above, include:
Suppose you are developing an intelligent customer service system for internal enterprise use, with requirements such as:
Using Spring AI Alibaba, the system can be designed as follows:
@Prompt annotations to define prompt templates for multiple business modules (e.g., "Customer Complaint Handling," "Expense Reimbursement Queries," etc.), where each module can be considered as a component class, facilitating independent iteration and maintenance.@Tool or @Function annotations to expose backend HTTP interfaces or local Java methods as functions callable by LLM, like checking customer order status or triggering CRM updates, similar to LangChain’s tool invocation mode, but naturally supporting the Spring Bean lifecycle and injection.role=admin) in the JWT dictate access permissions by the gateway. It’s like having a digital pass labeled "I am a VIP customer"; the system scans the pass to confirm identity and permissions, commonly used in front-end and back-end separated architectures that support enterprise systems with SSO (single sign-on).Moreover, AI gateways, represented by Higress / Alibaba Cloud API Gateway, offer more capabilities through a flexible and extendable plugin mechanism, allowing users to develop custom plugins to enrich functionality.
| Category | Traditional Application | Large Model Application |
|---|---|---|
| Observable Objects | Backend Logic, Database Queries, API Calls | Prompt Input/Output, Model Inference Process, Context Changes, Thought Chains |
| Focus Points | Performance Bottlenecks, Service Status, Exception Stacks | Reasonableness of Responses, Consistency, Deviations, Hallucination, and Potential Misinterpretations |
| Observable Granularity | Function-Level, Call Chain-Level | Token-Level, Semantic Level, Behavioral Path-Level |
For example, classic applications might focus on "Does this interface time out?" while large model applications also need to address questions like "Why did this model say something inappropriate?" and "Did it misunderstand the user's intention?"
For instance, an AI medical Q&A system might not produce any crash logs, yet if the model outputs an incorrect recommendation like "not advised to take cold medicine," then such an error requires semantic-level observability.
To tackle these issues, the first step is to determine which components are involved in a single invocation, and then connect all components through a calling chain. For complete pathway diagnostics, links need to be established; when a request encounters issues, it must be quickly identified which stage failed, whether it was the AI application or internal model inference.
The second step is to construct a full-stack observability data platform that can correlate all of this data well, including not just the links but also metrics; for example, GPU utilization within the model could help discern whether the issue is at the application or model level.
Finally, we should analyze model logs for input and output information from each invocation, using this data for evaluation and analysis to validate the quality of AI applications.
Starting with these three means, we provide various observational techniques and core focus points at different levels from a monitoring perspective.
More importantly, the advancement of AI Agent engineering is not merely the concern of Agent Builders, but is also related to the evolution of the entire industry. Only by continuously investing in multiple dimensions—development platforms, traffic and access management, toolchains, observability, security, etc.—and constructing reliable, stable, and reusable application infrastructure can we truly drive the large-scale implementation of upstream Agent applications, fostering the formation of a new generation of "application supply chain" ecosystems centered around large models.
[1] https://mp.weixin.qq.com/s/UF2ox3WEfehk3QDMCHqXZw
[2] https://mp.weixin.qq.com/s/WdTiY8esxUuW5cqIh_NlpQ
[3] https://blog.csdn.net/Julialove102123/article/details/146196173
Alibaba Cloud Bailian Open Source NL2SQL Intelligent Framework for Java Developers
620 posts | 54 followers
FollowAlibaba Clouder - September 3, 2020
Alibaba Cloud Native Community - September 29, 2025
Apache Flink Community - September 1, 2025
Apache Flink Community - October 15, 2025
Alibaba Cloud Native Community - October 22, 2025
Apache Flink Community - July 28, 2025
620 posts | 54 followers
Follow
API Gateway
API Gateway provides you with high-performance and high-availability API hosting services to deploy and release your APIs on Alibaba Cloud products.
Learn More
AgentBay
Multimodal cloud-based operating environment and expert agent platform, supporting automation and remote control across browsers, desktops, mobile devices, and code.
Learn More
Microservices Engine (MSE)
MSE provides a fully managed registration and configuration center, and gateway and microservices governance capabilities.
Learn More
AI Acceleration Solution
Accelerate AI-driven business and AI model training and inference with Alibaba Cloud GPU technology
Learn MoreMore Posts by Alibaba Cloud Native Community