Alibaba Cloud Shares New Features of Apache Flink 2.0 at Flink Forward Asia

Alibaba Cloud December 6, 2024

Enhanced data processing featured in Flink 2.0 is essential in the GenAI era



Jakarta, Indonesia, December 6, 2024 - Alibaba Cloud, the digital technology and intelligence backbone of Alibaba Group, highlighted the innovative features of the forthcoming Apache Flink 2.0 at Flink Forward Asia in Jakarta. This event marked the first conference in Southeast Asia dedicated to advancements in Apache Flink, an open-source framework designed to unify stream-processing and batch-processing by the Apache Software Foundation. As a long-term contributor, Alibaba Cloud has played a pivotal role in advancing the Apache Flink community, particularly in Asia.

Feng Wang, Head of Open Data Platform at Alibaba Cloud Intelligence, expressed his enthusiasm at the event: "With the growing demand for data, the need for efficient, scalable, and unified data processing is more critical than ever. The upcoming release of Flink 2.0 represents a significant advancement since its debut in 2016, coinciding with the tenth anniversary of Flink becoming an Apache top-level project. We are more committed than ever to supporting the Apache Flink community, and we will explore integrating the enhanced capabilities of Flink 2.0 into our cloud offerings for global customers in the near future."


Feng Wang, Head of Open Data Platform at Alibaba Cloud Intelligence



New Features of Apache Flink 2.0 for a New Computing Era

Scheduled for release in early 2025, Apache Flink 2.0 marks a significant evolution in data processing technologies, emphasizing a major progress towards a unified batch and stream processing architecture. This leads to simplified computing in the cloud-native era, laying the groundwork for handling hybrid workloads essential in the GenAI era, where sophisticated data processing capabilities are crucial for new AI applications.

Key contributions by Alibaba Cloud to Apache Flink 2.0 include:

Disaggregated state storage and management: To enable Flink’s cloud-native future, Alibaba Cloud introduces Disaggregated State Storage and Management, a feature that uses remote storage as primary storage in Flink 2.0. This new architecture enables users to handle massive datasets without worrying about local disk constraints, rescale jobs faster and more efficiently, reduces resource spikes, and achieves light and fast checkpoint in a native way.

Materialized Table: This feature is designed to further streamline batch and stream data processing while providing a unified development experience. In the upcoming Flink 2.0 release, Alibaba Cloud is enhancing operational supports for Materialized Tables, including connector integration with cutting-edge lake formats and production-ready schedulers.

Adaptive Batch Execution: By dynamically optimizing logical and physical plans based on execution insights, Flink 2.0 is enhancing the efficiency and performance of batch processing and Online Analytical Processing (OLAP) workloads.

Streaming Lakehouse architecture: This architecture, featured with stream-batch unified processing, can better support real-time data analytics by leveraging the unified data storage, open format and cost-effectiveness of the Lakehouse paradigm. As a result, users can better handle dynamic data updates and queries with varying levels of freshness, catering to a wide range of analytical needs.

Customer Success with Apache Flink

Mekari, Indonesia's leading software-as-a-service (SaaS) company, utilized Flink 1.0’s robust stream processing capabilities to overcome challenges in real-time data processing and integration. This enabled them to capture data in real-time from various sources and seamlessly integrate it into MaxCompute, Alibaba’s large-scale data processing platform. This integration not only reduced processing time but also enhanced the speed and quality of decision-making through reliable streaming data ingestion, thereby improving operational efficiency and providing stakeholders with timely, accurate insights.

Wang added, "Apache Flink 2.0 reveals the future of data processing and its potential impact on the cloud+AI industry. By modernizing its components, embracing more AI innovations, and improving integrations with other Apache projects, Apache Flink has set a new benchmark for the industry. We look forward to further our engagement with the Apache Flink community to fully leverage the capabilities of this groundbreaking technology.”

At the event, Alibaba Cloud also announced it has open-sourced Fluss (Flink Unified Streaming Storage) on GitHub, a streaming storage solution tailored for real-time analytics that serves as the real-time data layer on Lakehouse architecture. Fluss effectively bridges the gap between data streaming and data Lakehouse, facilitating low-latency, high-throughput data ingestion and processing. It integrates seamlessly with leading compute engines, including Apache Flink, enhancing its utility and efficiency in data management.

Alibaba Cloud has been contributing its technology to drive the development of the open-sourced Apache Flink community, optimizing Flink’s SQL and Runtime Layers, as well as integrating Flink with other ecosystem projects. These innovations have greatly improved Flink’s overall mechanisms for enhanced scalability, reliability, stability and performance.

About Alibaba Cloud

Established in 2009, Alibaba Cloud (www.alibabacloud.com) is the digital technology and intelligence backbone of Alibaba Group. It offers a complete suite of cloud services to customers worldwide, including elastic computing, database, storage, network virtualization services, large-scale computing, security, big data analytics, machine learning and artificial intelligence (AI) services. Alibaba has been named the leading IaaS provider in Asia Pacific by revenue in U.S. dollars since 2018, according to Gartner. It has also maintained its position as one of the world’s leading public cloud IaaS service providers since 2018, according to IDC.

phone Contact Us