This article is compiled based on the presentation by Xintong Song, Apache Flink PMC member, and Staff Software Engineer, Alibaba Cloud, at the Community Over Code Asia 2025 Streaming track, providing detailed introduction to the Flink Agents project's background, architectural design, and application prospects.
In today's era of rapid artificial intelligence development, AI applications are evolving from simple conversational interactions towards more complex and intelligent directions. The Apache Flink community recently launched a brand new project - Flink Agents, an intelligent agent framework specifically designed for event-driven scenarios. This article will deeply explore the technical architecture, application scenarios, and the significant importance of Flink Agents in the AI engineering roadmap.

Flink's focus on layers 3-4 positions it as a leader in real-time AI execution, with Flink Agents specifically targeting the fourth layer through its event-driven architecture.

While there are already many AI agent frameworks available, the Apache Flink community chose to develop a new one because of the unique needs of event-driven scenarios.
Most existing AI systems are conversational agents, where users initiate interactions via natural language. Examples include AI Coding, ChatBI, and DeepResearch. These are user-triggered systems.
In contrast, event-driven agents are triggered automatically by system events or data updates. As AI matures, we expect more automation — similar to how OLAP analysis evolved from manual SQL queries to fully automated, high-throughput operations. Future AI will increasingly rely on system-generated triggers rather than human input.
Let’s look at two typical application scenarios:

In live streaming and live shopping, top-tier streams generate vast amounts of comments and messages. Traditionally, this requires a team of analysts and moderators to monitor and respond.
With event-driven AI agents, the system can process and summarize these comments in real time. For instance, it can identify frequently asked questions, detect technical issues such as audio/video sync problems, and analyze audience demographics using multimodal AI models. Based on these insights, the system can recommend product adjustments or suggest suitable background music.

The second application scenario is intelligent operations. Cloud platforms like Alibaba Cloud's Realtime Compute already support rule-based automated operations capabilities, such as subscribing to real-time metrics and exception events during job execution and applying predefined rules to handle them
However, traditional rule-based approaches have limitations in addressing complex issues effectively. For example, common errors like heartbeat timeouts can arise from multiple causes: prolonged garbage collection (GC) due to JVM memory issues, node network problems, or even normal scheduling of underlying resource clusters. Different root causes require distinct diagnostic methods and solutions, which static rule-based programming struggles to cover comprehensively
Flink Agents, currently in development, aim to address these gaps by introducing AI-driven, event-based decision-making. On top of rule-based automated operation, the system can delegate diagnostic tasks to AI. AI can utilize RAG technology to query operational knowledge bases, apply accumulated expertise to select appropriate diagnostic steps (e.g., log searches or node health checks via tool invocations), and execute low-impact actions autonomously. For high-impact operations, AI can trigger human confirmation before execution.
Based on these use cases, several core requirements emerge for building effective event-driven AI agents:
They must support real-time processing, since events often require immediate action. They also need to handle large-scale event volumes, as system-generated events typically far exceed human-initiated requests in frequency.
High stability is essential, especially since event-driven agents often run 24/7 without constant oversight. Additionally, they must manage both data processing and AI processing throughout their workflow and integrate with multiple system sources to consume diverse types of events.
These requirements align closely with Apache Flink’s core strengths, including sub-millisecond latency, distributed scalability, state management, fault tolerance, and strong ecosystem support.


Initiated in 2025, the Flink Agents project is a brand-new initiative developed entirely within the Apache Flink community. It was not derived from any internal company tool but rather built from scratch.
The architecture of Flink Agents is guided by several core principles:

It maintains familiar AI agent concepts, making it accessible to developers already experienced with agent-based systems. It offers APIs in both Python and Java, supporting both Workflow and ReAct patterns for building agents.
Flink Agents integrates with major large language models, provides MCP protocol compatibility, and enables Java/Python functions to be used directly as tools. It also provides standard implementations for common components like vector stores, with the ability to extend them as needed.
In terms of runtime, Flink Agents offers a lightweight Python runtime for local development and testing, as well as a full distributed Flink runtime that supports distributed execution, state management, fault tolerance, and end-to-end consistency guarantees.

Flink Agents employs an event-centric orchestration model. Each agent consists of a series of Actions, each of which is triggered by specific Events. During execution, Actions can emit new Events, which in turn trigger additional Actions.
This architecture supports both Workflow Agents, which gives developers precise control over the sequence of actions, and ReAct Agents, which delegates more autonomy to the AI model with minimal configuration.
The framework supports both built-in and user-defined Actions and Events, allowing for flexible combinations tailored to different use cases. All internal operations are driven by Events, and the framework exposes meta-events such as Action start and end notifications. These enable detailed logging and real-time monitoring through callback mechanisms.
The launch of the Flink Agents project marks a significant milestone for the Apache Flink community in the field of AI. By combining Flink’s powerful stream processing capabilities with AI agent technologies, it provides a robust solution for building industrial-grade event-driven AI applications.
Although still in its early stages, Flink Agents demonstrates strong potential in both design philosophy and technical architecture. As AI continues to advance and move toward large-scale deployment, event-driven agents will play an increasingly important role.
Flink Agents offers a solid foundation for this emerging field and is worth close attention and further exploration.
For developers and enterprises looking to build large-scale, reliable AI applications, Flink Agents presents a promising new option. It builds upon the streaming strengths of Apache Flink while introducing specialized optimizations for AI workloads. With its robust architecture and growing community, Flink Agents is positioned to become a key enabler in the next generation of AI development.
If you’re interested in building scalable, autonomous AI systems in an event-driven world, Flink Agents is definitely a project to keep an eye on.

If you’re interested in building scalable, autonomous AI systems in an event-driven world, Flink Agents is definitely a project to keep an eye on.
We warmly invite developers, contributors, and AI enthusiasts to join us in shaping the future of event-driven AI with Flink.
Flink Agents Project Information
GitHub Repository: https://github.com/apache/flink-agents
The first MVP release is planned for the end of September 2025. You can find the timeline and design documentation in the GitHub Discussions section: https://github.com/apache/flink-agents/discussions
Flink State Management: A Journey from Core Primitives to Next-Generation Incremental Computation
206 posts | 54 followers
FollowApache Flink Community - October 15, 2025
Apache Flink Community - July 28, 2025
Alibaba Cloud Big Data and AI - October 27, 2025
Apache Flink Community - July 28, 2025
Apache Flink Community - August 1, 2025
Apache Flink Community - April 16, 2024
206 posts | 54 followers
Follow
Realtime Compute for Apache Flink
Realtime Compute for Apache Flink offers a highly integrated platform for real-time data processing, which optimizes the computing of Apache Flink.
Learn More
ApsaraMQ for RocketMQ
ApsaraMQ for RocketMQ is a distributed message queue service that supports reliable message-based asynchronous communication among microservices, distributed systems, and serverless applications.
Learn More
Message Queue for Apache Kafka
A fully-managed Apache Kafka service to help you quickly build data pipelines for your big data analytics.
Learn More
Offline Visual Intelligence Software Packages
Offline SDKs for visual production, such as image segmentation, video segmentation, and character recognition, based on deep learning technologies developed by Alibaba Cloud.
Learn MoreMore Posts by Apache Flink Community