Imagine sorting packages on a conveyor belt. Traditionally, a worker can only put packages at one exit. If they encounter damaged packages, they either have to stop the entire conveyor belt to handle them or discard them outright. This approach clearly isn't flexible enough. Things would be much simpler if workers could put different types of packages at different exits. This is the problem Side Outputs aims to solve.
In real-world stream processing, we often encounter situations like this:

The above diagram shows a typical scenario: an operator needs to split data into multiple streams for different destinations. Before Side Outputs, handling such situations was problematic:
Side Outputs' design is like installing a "flow splitter" on each operator, allowing different types of data to be output to different streams based on needs. This design includes two core concepts:

| Scenario | Traditional Approach | Using Side Outputs |
|---|---|---|
| Handling Corrupted Data | Either stop task or discard | Output to dedicated error handling stream |
| Processing Late Data | Must discard | Can output to late processing stream |
| Data Classification | Requires multiple separate tasks | Complete streaming split in one task |
| Debug and Monitoring | Difficult to track specific data types | Can separately collect data of interest |
Here's an example of processing order data, showing how to use Side Outputs to handle different situations:
// Define output tags
final OutputTag<Order> invalidOrders = new OutputTag<Order>("invalid-orders") {};
final OutputTag<Order> lateOrders = new OutputTag<Order>("late-orders") {};
// Process order stream
SingleOutputStreamOperator<Order> mainStream = orderStream
.process(new ProcessFunction<Order, Order>() {
@Override
public void processElement(Order order, Context ctx, Collector<Order> out) {
// Check order validity
if (!order.isValid()) {
// Output invalid orders to side output
ctx.output(invalidOrders, order);
return;
}
// Check order time
if (order.getTimestamp() < ctx.timerService().currentWatermark()) {
// Output late orders to side output
ctx.output(lateOrders, order);
return;
}
// Output normal orders to main stream
out.collect(order);
}
});
// Get and process invalid order stream
DataStream<Order> invalidOrderStream = mainStream.getSideOutput(invalidOrders);
invalidOrderStream.addSink(new InvalidOrderHandler());
// Get and process late order stream
DataStream<Order> lateOrderStream = mainStream.getSideOutput(lateOrders);
lateOrderStream.addSink(new LateOrderHandler());
Side Outputs uses a clever design:

This FLIP has been implemented in Flink and is now a widely used feature. Its implementation was done in two phases:
Side Outputs adds more flexible data processing capability to Flink, like adding multiple exits to a conveyor belt. This improvement makes data processing more elegant: no need to worry about bad data causing entire task failures, and no need to abandon late-arriving data. Through a simple API design, users can easily implement complex data streaming logic, making the entire data processing flow clearer and more efficient.
Building a Real-Time Advertising Lakehouse: Alibaba Mama's Practice with Flink & Paimon
206 posts | 54 followers
FollowApache Flink Community - July 11, 2025
Alibaba EMR - March 18, 2022
Apache Flink Community - April 9, 2024
Hologres - July 13, 2021
Apache Flink Community - October 12, 2024
Apache Flink Community - June 6, 2025
206 posts | 54 followers
Follow
Realtime Compute for Apache Flink
Realtime Compute for Apache Flink offers a highly integrated platform for real-time data processing, which optimizes the computing of Apache Flink.
Learn More
Big Data Consulting for Data Technology Solution
Alibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn More
Big Data Consulting Services for Retail Solution
Alibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn More
Hologres
A real-time data warehouse for serving and analytics which is compatible with PostgreSQL.
Learn MoreMore Posts by Apache Flink Community