In other words, a single call to the language model is a one-shot process: an input and a corresponding output. This approach suffices when the answer is already in the language model's knowledge base or in one of its retrieved documents. It all falls apart once there is an actual question that cannot be answered within this scope, such as analysing a technology from different perspectives in multiple jurisdictions, explaining how a particular protocol was developed, and what replaced it later, or checking the validity of a statement based on its sources. Questions of this sort require breaking down the problem, iteratively finding relevant documents, reconciling contradictory information, and synthesising the results logically. The Deep Research agent approach does exactly this.
The loop is the pattern.
The characteristic feature of the pattern is that it does not respond only once; rather, it goes around in a loop. It might be helpful to define the pattern as such: if there is no loop, then there can’t be an agent. The typical implementation of the pattern follows what we call the ReAct pattern: reason, action, then observation in a loop. That is, the model thinks about the situation, performs some action (generally, tool invocation), observes what happens, incorporates that observation into the context, and so on. Two additional layers atop the basic loop make it into a research agent.
Planning takes place before anything else and should be done only once, at a high level of abstraction. The agent analyses the original request and generates a tree of sub-requests. These can form a flat set of requests, but they are more likely to depend on each other. In that case, later requests will not be resolved unless earlier requests are resolved. Separation of planning and execution is important because it allows the p
lanner to focus on all steps in advance. The executor can then concentrate on completing one step without having to think about subsequent steps. Reflection occurs several times as the number of observations grows. The agent compares the results it obtained with its goal, evaluates whether the answer to each sub-request is enough, finds any possible gaps, and suggests additional actions.

What sits inside the loop
There are four elements in every implementation, regardless of approach. The planner breaks down the objective and tracks the research plan. The tool component connects the agent to the real world—web searches, browsing, extracting information from pages, searching the agent’s personal database, and other computations done with code rather than memorised information. The memory or state holds acquired data, references, and remaining questions. A reflector or critic determines progress. The fourth element is the synthesiser, which combines all gathered information into a summary report with references. These do not have to be four different systems, but one system serving multiple roles.
Single agent or orchestra
The most straightforward viable solution is that of one model executing one ReAct loop together with the use of an exploration module. This approach can be easily reasoned about, is simple to debug, and is generally sufficient for tasks where there is a clear ending point. Additional complexity must only be introduced when there is a real need for it.
Next comes the plan-and-execute approach with parallelisation; the planner creates the dependency graph, the orchestrator walks the graph topologically, and the researcher instances are executed in parallel over different sub-problems independently. The map-reduce type of decomposition helps in reducing the elapsed time on large questions by ensuring that the contexts of each sub-problem are kept separate. However, there will be some cost incurred due to increased complexity when coordinating multiple research instances and debugging becomes much more difficult since non-deterministic execution involving multiple branches is very hard to log; the same input data can result in different outcomes depending on the execution. One good starting point would be to begin with a single researcher instance and move on to orchestration when it seems necessary.
What actually breaks
The first of these is growth in the context. Research sessions produce output after output until, ultimately, there is an overflow of information in the context box. But even before that point, all that unprocessed information can drown out the signal the model is trying to process. This problem can be overcome by regularly rebuilding the context.
The second aspect is error propagation. When there’s a bad retrieval in the second step, this influences each subsequent step of reasoning that follows. With errors propagating through the process, reflection can be most beneficial when it can ground itself against something verifiable – the retrieval relevance score, tool feedback, or another search – not when it depends solely on the model reflecting on itself. A self-reflection by the model of its hallucination is a known failure path; reflection needs outside contradiction.
The second aspect is error propagation. When there’s a bad retrieval in the second step, this influences each subsequent step of reasoning that follows. With errors propagating through the process, reflection can be most beneficial when it can ground itself against something verifiable – the retrieval relevance score, tool feedback, or another search – not when it depends solely on the model reflecting on itself. A self-reflection by the model on its hallucination is a known failure path; such reflection requires external contradiction.
Reference implementations
This phenomenon came about in practice during the development of the "deep research" capabilities that became available to multiple model companies in 2024 and 2025, and is also present in its openly available variants, including both frameworks' recipe templates (such as the plan-execution and orchestrator-worker graphs used in Lang Graph and other libraries), as well as search-capable models optimized for agents.
An actual example of this open model is the one provided by Alibaba’s Tongyi Lab, named Tongyi Deep Research, published under the Apache 2.0 license in September 2025. This model is based on a Qwen3-30B-A3B mixture-of-experts architecture (with around 30.5 billion parameters with 3.3 billion activations per token) and has support for up to 128K tokens in context. While it can be interesting to mention this model from a product perspective, we focus more on what it represents in terms of the patterns described above: specifically, it operates using the ReAct loop for a baseline approach, while offering an “alternative” “Heavy” mode that uses iterative research as the core strategy of repeatedly restructuring its context. Its weights, scripts for inference, and evaluations are openly available, allowing it to be used as a reference point to understand how research loops work in practice.
Choosing the pattern
The Deep Research pattern is appropriate when the task is truly multi-source and multi-step, where a single retrieval isn’t sufficient to solve the task, and the risk associated with being wrong is costly enough to make this additional latency and increased costs worthwhile. Otherwise, for tasks that can be answered by a single call with retrieval augmentation, the additional time required to iterate through the loop is unnecessary. However, for tasks where Deep Research is appropriate, start with the most minimal version possible – one agent, one plan, reflection with a clear signal, and citations.
Disclaimer: This article is for educational purposes only and reflects the author's own views. Any products or tools mentioned are referenced factually, not as endorsements, and details may change as the technology evolves.
Model Studio Architecture: A Deep Dive into Alibaba Cloud’s GenAI Application Platform
111 posts | 2 followers
FollowAlibaba Clouder - July 28, 2020
Alibaba Clouder - October 28, 2019
Alibaba Clouder - September 14, 2017
Alibaba Cloud Native Community - April 16, 2026
PM - C2C_Yuan - June 11, 2026
Alibaba Cloud Native Community - June 15, 2026
111 posts | 2 followers
Follow
Qwen
Full-range, open-source, multimodal, and multi-functional
Learn More
AgentBay
Multimodal cloud-based operating environment and expert agent platform, supporting automation and remote control across browsers, desktops, mobile devices, and code.
Learn More
Alibaba Cloud Model Studio
A one-stop generative AI platform to build intelligent applications that understand your business, based on Qwen model series such as Qwen-Max and other popular models
Learn More
AI Acceleration Solution
Accelerate AI-driven business and AI model training and inference with Alibaba Cloud GPU technology
Learn MoreMore Posts by PM - C2C_Yuan