This is Technical Insights Series by Perry Ma | Product Lead, Real-time Compute for Apache Flink at Alibaba Cloud.
Imagine in an old library, you need to find each book's "good friends" (similar books). Traditional methods either require putting all books on a large table for comparison (memory-limited) or making copies of each book for comparison (data duplication). Neither approach is elegant. FLIP-14 aims to solve this problem by providing a smarter way to handle these "pairing" operations.
When processing graph data, we often need to pair-wise compare data within the same group. For example:
(1)Calculating friend recommendations in social networks
(2)Finding triangle relationships in networks
(3)Computing similarity between items

Currently, there are two solutions, but neither is perfect:
| Solution | Advantages | Disadvantages |
|---|---|---|
| GroupReduce | High flexibility, customizable pairing logic | Needs to fit entire group in memory, prone to memory overflow |
| Self-Join | Simple implementation, automatically handled by system | Requires data duplication, produces full Cartesian product, low efficiency |
The CrossGroup operator's design is like equipping the library with a smart librarian who knows how to efficiently match books without putting them all on the table.

CrossGroup provides two processing modes for different data distribution characteristics:

For uniformly distributed data, a simple iterator can process efficiently. For skewed data, a three-phase processing approach ensures load balancing.
The CrossGroup operator is particularly suitable for:
| Scenario | Example | Advantage |
|---|---|---|
| Graph Analysis | Social Network Friend Recommendations | Efficient processing of node relationships |
| Similarity Calculation | Item Recommendation Systems | Avoids unnecessary pairing |
| Network Analysis | Triangle Relationship Detection | Better memory usage efficiency |
| Bipartite Graph Processing | User-Item Association Analysis | Optimized for data skew |
This FLIP is currently in a Reopened state. Although the feature was initially designed to optimize multiple scenarios in Flink's Gelly (graph computation) module, including:
(1)AdamicAdar similarity calculation
(2)Jaccard index computation
(3)Triangle relationship detection
(4)Bipartite graph projection methods
The improvement proposal is currently under re-evaluation, and new design and implementation solutions may emerge. Interested developers can follow the latest progress on JIRA.
The CrossGroup operator brings a more elegant data pairing solution to Flink. It's like an experienced librarian who knows both how to efficiently match books and how to choose the most appropriate matching strategy for different situations. This improvement makes Flink more efficient in handling graph analysis, similarity computation, and other scenarios, while also providing users with a simpler programming model.
Accelerate Data Ingestion in Real-time Lakehouse with Apache Flink CDC
Apache Paimon: Real-Time Lake Storage with Iceberg Compatibility 2025
206 posts | 54 followers
FollowApache Flink Community China - December 25, 2019
Apache Flink Community China - April 17, 2023
Apache Flink Community China - September 15, 2022
Apache Flink Community - March 6, 2025
Apache Flink Community China - August 4, 2021
Apache Flink Community China - September 16, 2020
206 posts | 54 followers
Follow
Realtime Compute for Apache Flink
Realtime Compute for Apache Flink offers a highly integrated platform for real-time data processing, which optimizes the computing of Apache Flink.
Learn More
Real-Time Livestreaming Solutions
Stream sports and events on the Internet smoothly to worldwide audiences concurrently
Learn More
Real-Time Streaming
Provides low latency and high concurrency, helping improve the user experience for your live-streaming
Learn More
Application Real-Time Monitoring Service
Build business monitoring capabilities with real time response based on frontend monitoring, application monitoring, and custom business monitoring capabilities
Learn MoreMore Posts by Apache Flink Community