By Wang Chen
Almost every month, new commercial products or open-source projects emerge in the field of Agent development tools, but the application architecture of Agent remains relatively stable.
The models have brought awareness and autonomy, but they have reduced the determinism and consistency of the output results. Whether it’s foundational large model vendors or companies providing development toolchains and operational guarantees, their essence is to improve output reliability; only differing team genetics and industry judgment have provided various implementation paths. Below, we review the evolution of the Agent development toolchain in four stages by connecting several well-known development tools.
At the end of 2022, the release of ChatGPT allowed the world to intuitively feel the general intelligence potential of large language models for the first time, but at that time, LLMs were still isolated intelligences, unable to leverage the strength of a vast number of developers to accelerate industry development.
This was followed by the emergence of the first batch of Agent frameworks, such as LangChain and LlamaIndex, which reduced development complexity through modular abstractions such as model communication, ChatClient, Prompt, formatted output, Embedding, etc., enabling quick construction of chatbots, connecting contexts, and calling models.
In 2024, Spring AI Alibaba launched, providing high-level AI API abstractions and cloud-native infrastructure integration solutions to help Java developers quickly build AI applications. In the future, it will serve as a part of the AgentScope ecosystem, located to facilitate connections between Spring and AgentScope, and plans to release the AgentScope Java version by the end of November this year, aligning with the capabilities of AgentScope Python.
With the rapid development of the industry, various basic development frameworks are continually evolving, gradually supporting or integrating capabilities like retrieval, retrieval-augmented generation (RAG), Memory, tools, evaluation, observability, AI gateways, etc., and offering single-agent, workflow, and multi-agent development paradigms, as well as deep research agents and general-purpose agents based on frameworks, like DeepRearch and Jmanus.
Although development frameworks may not be as sexy as some researchers think, they play an irreplaceable role in tapping the vast number of developers quickly into the AI development ecosystem.
Although large models are intelligent, they lack the tools to extend into the physical world; they are both unreadable and unwritable to the physical world. At the same time, the application development frameworks of the first stage are not friendly to non-programmers, which is not conducive to inter-team collaboration and the involvement of domain experts. Thus, between 2023 and 2024, low-code and even no-code development frameworks like Dify and n8n were pushed into enterprise production environments, defining task processing workflows through workflows and adding if/else branches; even using natural language to generate some simple frontend pages, improving the collaboration efficiency between domain experts and programmers.
On the tools level, in June 2023, OpenAI officially launched Function Calling. In November 2024, Anthropic released the MCP protocol to enable cross-model tool interoperability. Especially with the emergence of MCP, the developer ecosystem was significantly activated.
Thus, together, they pushed the Agent development toolchain into the second stage: Tools & Collaboration.
However, simply lowering the barrier to building applications and allowing applications to call external applications or systems through tools hasn’t effectively solved the consistency and reliability of the output. Therefore, the evolution of the developer toolchain has entered the deep waters. In 2024, Andrej Karpathy proposed context engineering, sparking industry resonance; how to select context, organize context, and dynamically adjust context structures across different tasks became key to improving output stability, thus entering the stage of Reinforcement Learning (RL).
System prompts, knowledge bases, tools, and memory are important components of context engineering. Although the mechanisms have matured, outputs still fluctuate, relying on RL to turn context engineering from static templates into intelligent dynamic strategies. For example:
RL is a challenge in the industry, relying both on algorithmic technology and requiring sufficient domain know-how while facing generality challenges. However, there are notable practical implementations.
Jina.AI was recently acquired by Elastic Company; its CEO, Dr. Xiao Han, shared in an article titled "The Future of Search Lies in a Bunch of Small Models" about Jina.AI's research on search foundational models, mainly including Embeddings, Reranker, and Reader:
Additionally, Alibaba Cloud's API gateway, based on RL, provides tool optimization and semantic retrieval capabilities, enhancing the call quality of batch MCPs and reducing call time. For example, through reordering and optional query rewriting, it pre-processes and filters the list of tools before the request is sent to a large language model, improving response speed and selection accuracy in large-scale toolset scenarios while reducing token costs. In evaluations using different scales of toolsets (50/100/200/300/400/500) on Salesforce's open-source dataset, results indicate:
These enterprises excelling in RL often regard these practices as competitive and revenue-generating aspects of their commercial products, hence they don’t enable quick earnings for developers like frameworks or tools do. Thus, it has evolved to the fourth stage, where foundational model vendors directly engage in context engineering.
In October 2025, OpenAI AgentKit and Apps SDK, and Claude Skills were released, marking the entry of Agent engineering into the era of model centralization.
Especially with Claude Skills, the Skills construction capability, and MCP connecting tools, it doesn’t even require MCP; Skills execute Python scripts to directly connect to APIs, with large models generating new Skills. This shifts the responsibilities of Agent context engineering from developers to the framework side, including construction, execution, and operation.
Compared to the rapidly evolving Agent development toolchain, the mapping infrastructure of the Agent application architecture remains relatively stable.
We shared in the "AI Native Application Architecture White Paper" that the AI native application architecture includes 11 key elements: models, development frameworks, prompts, RAG, memory, tools, AI gateways, runtime, observability, evaluation, and security. Take AI gateways, runtime, observability, and security as examples.
Rapid iteration and innovation in toolchains enhance output reliability; runtime modules such as gateways, computing power, observability, and security ensure that applications run stably, economically, and safely. It is precisely this structure of "rapid change above, stable state below" that guarantees the AI application ecosystem can innovate at high speed without falling into systemic chaos.
If you want to learn more about Alibaba Cloud API Gateway (Higress), please click: https://higress.ai/en/
The Evolution of Cloud Native: Accelerating the Development of AI Applications
The Assessment Engineering Is Becoming a Key Focus of the Next Round of Agent Evolution
629 posts | 55 followers
FollowAlibaba Cloud Native Community - December 11, 2025
Alibaba Cloud Native Community - August 25, 2025
Alibaba Cloud Native Community - September 9, 2025
Alibaba Clouder - January 11, 2021
Alibaba Cloud Serverless - February 17, 2023
Alibaba Cloud Native Community - May 8, 2025
629 posts | 55 followers
Follow
API Gateway
API Gateway provides you with high-performance and high-availability API hosting services to deploy and release your APIs on Alibaba Cloud products.
Learn More
AgentBay
Multimodal cloud-based operating environment and expert agent platform, supporting automation and remote control across browsers, desktops, mobile devices, and code.
Learn More
AI Acceleration Solution
Accelerate AI-driven business and AI model training and inference with Alibaba Cloud GPU technology
Learn More
Offline Visual Intelligence Software Packages
Offline SDKs for visual production, such as image segmentation, video segmentation, and character recognition, based on deep learning technologies developed by Alibaba Cloud.
Learn MoreMore Posts by Alibaba Cloud Native Community