×
Community Blog The Deep Research Agent Pattern

The Deep Research Agent Pattern

A single language-model call answers in one pass, which is fine for self-contained questions but breaks down on open-ended research that spans many sources.

In other words, a single call to the language model is a one-shot process: an input and a corresponding output. This approach suffices when the answer is already in the language model's knowledge base or in one of its retrieved documents. It all falls apart once there is an actual question that cannot be answered within this scope, such as analysing a technology from different perspectives in multiple jurisdictions, explaining how a particular protocol was developed, and what replaced it later, or checking the validity of a statement based on its sources. Questions of this sort require breaking down the problem, iteratively finding relevant documents, reconciling contradictory information, and synthesising the results logically. The Deep Research agent approach does exactly this.


The loop is the pattern.

The characteristic feature of the pattern is that it does not respond only once; rather, it goes around in a loop. It might be helpful to define the pattern as such: if there is no loop, then there can’t be an agent. The typical implementation of the pattern follows what we call the ReAct pattern: reason, action, then observation in a loop. That is, the model thinks about the situation, performs some action (generally, tool invocation), observes what happens, incorporates that observation into the context, and so on. Two additional layers atop the basic loop make it into a research agent.

Planning takes place before anything else and should be done only once, at a high level of abstraction. The agent analyses the original request and generates a tree of sub-requests. These can form a flat set of requests, but they are more likely to depend on each other. In that case, later requests will not be resolved unless earlier requests are resolved. Separation of planning and execution is important because it allows the p
lanner to focus on all steps in advance. The executor can then concentrate on completing one step without having to think about subsequent steps. Reflection occurs several times as the number of observations grows. The agent compares the results it obtained with its goal, evaluates whether the answer to each sub-request is enough, finds any possible gaps, and suggests additional actions.

deepresearch_loop


What sits inside the loop

There are four elements in every implementation, regardless of approach. The planner breaks down the objective and tracks the research plan. The tool component connects the agent to the real world—web searches, browsing, extracting information from pages, searching the agent’s personal database, and other computations done with code rather than memorised information. The memory or state holds acquired data, references, and remaining questions. A reflector or critic determines progress. The fourth element is the synthesiser, which combines all gathered information into a summary report with references. These do not have to be four different systems, but one system serving multiple roles.


Single agent or orchestra

The most straightforward viable solution is that of one model executing one ReAct loop together with the use of an exploration module. This approach can be easily reasoned about, is simple to debug, and is generally sufficient for tasks where there is a clear ending point. Additional complexity must only be introduced when there is a real need for it.
Next comes the plan-and-execute approach with parallelisation; the planner creates the dependency graph, the orchestrator walks the graph topologically, and the researcher instances are executed in parallel over different sub-problems independently. The map-reduce type of decomposition helps in reducing the elapsed time on large questions by ensuring that the contexts of each sub-problem are kept separate. However, there will be some cost incurred due to increased complexity when coordinating multiple research instances and debugging becomes much more difficult since non-deterministic execution involving multiple branches is very hard to log; the same input data can result in different outcomes depending on the execution. One good starting point would be to begin with a single researcher instance and move on to orchestration when it seems necessary.


What actually breaks

The first of these is growth in the context. Research sessions produce output after output until, ultimately, there is an overflow of information in the context box. But even before that point, all that unprocessed information can drown out the signal the model is trying to process. This problem can be overcome by regularly rebuilding the context.
The second aspect is error propagation. When there’s a bad retrieval in the second step, this influences each subsequent step of reasoning that follows. With errors propagating through the process, reflection can be most beneficial when it can ground itself against something verifiable – the retrieval relevance score, tool feedback, or another search – not when it depends solely on the model reflecting on itself. A self-reflection by the model of its hallucination is a known failure path; reflection needs outside contradiction.
The second aspect is error propagation. When there’s a bad retrieval in the second step, this influences each subsequent step of reasoning that follows. With errors propagating through the process, reflection can be most beneficial when it can ground itself against something verifiable – the retrieval relevance score, tool feedback, or another search – not when it depends solely on the model reflecting on itself. A self-reflection by the model on its hallucination is a known failure path; such reflection requires external contradiction.


Reference implementations

This phenomenon came about in practice during the development of the "deep research" capabilities that became available to multiple model companies in 2024 and 2025, and is also present in its openly available variants, including both frameworks' recipe templates (such as the plan-execution and orchestrator-worker graphs used in Lang Graph and other libraries), as well as search-capable models optimized for agents.

An actual example of this open model is the one provided by Alibaba’s Tongyi Lab, named Tongyi Deep Research, published under the Apache 2.0 license in September 2025. This model is based on a Qwen3-30B-A3B mixture-of-experts architecture (with around 30.5 billion parameters with 3.3 billion activations per token) and has support for up to 128K tokens in context. While it can be interesting to mention this model from a product perspective, we focus more on what it represents in terms of the patterns described above: specifically, it operates using the ReAct loop for a baseline approach, while offering an “alternative” “Heavy” mode that uses iterative research as the core strategy of repeatedly restructuring its context. Its weights, scripts for inference, and evaluations are openly available, allowing it to be used as a reference point to understand how research loops work in practice.


Choosing the pattern

The Deep Research pattern is appropriate when the task is truly multi-source and multi-step, where a single retrieval isn’t sufficient to solve the task, and the risk associated with being wrong is costly enough to make this additional latency and increased costs worthwhile. Otherwise, for tasks that can be answered by a single call with retrieval augmentation, the additional time required to iterate through the loop is unnecessary. However, for tasks where Deep Research is appropriate, start with the most minimal version possible – one agent, one plan, reflection with a clear signal, and citations.


Disclaimer: This article is for educational purposes only and reflects the author's own views. Any products or tools mentioned are referenced factually, not as endorsements, and details may change as the technology evolves.

0 0 0
Share on

PM - C2C_Yuan

111 posts | 2 followers

You may also like

Comments

PM - C2C_Yuan

111 posts | 2 followers

Related Products

  • Qwen

    Full-range, open-source, multimodal, and multi-functional

    Learn More
  • AgentBay

    Multimodal cloud-based operating environment and expert agent platform, supporting automation and remote control across browsers, desktops, mobile devices, and code.

    Learn More
  • Alibaba Cloud Model Studio

    A one-stop generative AI platform to build intelligent applications that understand your business, based on Qwen model series such as Qwen-Max and other popular models

    Learn More
  • AI Acceleration Solution

    Accelerate AI-driven business and AI model training and inference with Alibaba Cloud GPU technology

    Learn More