Imagine a factory assembly line: products rush by on the main line while reference materials like quality inspection manuals and parameter tables sit on nearby workbenches. Workers can consult these materials anytime while processing products. Similar scenarios exist in Flink's stream processing: when processing the main data stream, we often need to reference auxiliary data to guide the processing. This is the problem FLIP-17 aims to solve - adding side inputs functionality to the DataStream API.
Let's look at a concrete example: suppose we're processing an online store's order stream and need to calculate final prices based on different product discount information. The discount information is stored in another data stream and updates frequently. This requires the ability to access the latest discount information while processing orders.

This example demonstrates typical use cases for Side Inputs. It mainly solves four types of problems:
Side Inputs work like reference material racks beside factory assembly lines, where workers can look up needed information at any time. The specific workflow is:

There are several key design aspects:
Like factory workers needing to wait for technical parameters and quality standards before starting processing, the system will:
Three data storage methods are provided:
For instance, creating a real-time product recommendation system:
// Main stream: user browsing records
DataStream<String> mainStream = ...
// Side Input: real-time product rating data
DataStream<String> sideStream = ...
// Create a Side Input wrapper
SingletonSideInput<String> productRatings =
new SingletonSideInput<>(sideStream);
// Use Side Input in main processing logic
mainStream
.filter(new RichFilterFunction<>() {
boolean filter(String product) {
// Get real-time product rating
String rating = getRuntimeContext()
.getSideInput(productRatings);
// Filter based on rating
return shouldRecommend(product, rating);
}
}).withSideInput(productRatings);
This FLIP is still under discussion, facing two main technical challenges:
Current development progress is shown in the following table:
| Phase | Status | Notes |
|---|---|---|
| Design Discussion | In Progress | Discussing various implementation options |
| Basic Framework | Not Started | Waiting for design finalization |
| Core Features | Not Started | Planned two-phase implementation |
| Window Support | Not Started | Second phase feature |
There are three implementation options:
Each approach has its pros and cons:
| Approach | Advantages | Disadvantages |
|---|---|---|
| Reuse TwoInputStreamOperator | Simple implementation, easy to prototype | Limited extensibility |
| StreamTask Management | Clean interfaces | Increases system complexity |
| N-ary Input | High flexibility, good extensibility | Complex implementation |
Side Inputs will bring more flexible data processing capabilities to Flink's stream processing. Like a factory assembly line equipped with not just the main production line but also various reference materials and auxiliary tools, making the entire production process more efficient and precise. Although this feature is still under development, it demonstrates Flink's continued efforts to enhance stream processing capabilities.
FLIP-16: Reliable Iterative Stream Processing in Apache Flink
Apache Flink FLIP-18: Accelerating Sorting with Code Generation
205 posts | 52 followers
FollowApache Flink Community China - September 16, 2020
Apache Flink Community China - August 22, 2023
Apache Flink Community - January 31, 2024
Apache Flink Community - July 11, 2025
Data Geek - May 9, 2023
Apache Flink Community China - May 17, 2021
205 posts | 52 followers
Follow
Realtime Compute for Apache Flink
Realtime Compute for Apache Flink offers a highly integrated platform for real-time data processing, which optimizes the computing of Apache Flink.
Learn More
Message Queue for Apache Kafka
A fully-managed Apache Kafka service to help you quickly build data pipelines for your big data analytics.
Learn More
ApsaraDB for SelectDB
A cloud-native real-time data warehouse based on Apache Doris, providing high-performance and easy-to-use data analysis services.
Learn More
ApsaraMQ for RocketMQ
ApsaraMQ for RocketMQ is a distributed message queue service that supports reliable message-based asynchronous communication among microservices, distributed systems, and serverless applications.
Learn MoreMore Posts by Apache Flink Community