All Products
Search
Document Center

Realtime Compute for Apache Flink:August 21, 2023

Last Updated:Jul 12, 2024

This topic describes the major updates and bug fixes of the Realtime Compute for Apache Flink version released on August 21, 2023.

Important

Ververica Runtime (VVR) 8.0.1 introduced in this release occasionally causes data loss in specific scenarios, which affects data accuracy. After careful evaluation, Alibaba Cloud decides to announce the End of Support (EOS) for VVR 8.0.1. We recommend that you upgrade to VVR 8.0.5 or later at the earliest opportunity. For information about how to upgrade, see Upgrade the engine version of deployments. We provide necessary support and guidance to help you smoothly transition to a more secure and stable version. Thank you for your understanding and cooperation.

Overview

This release of Realtime Compute for Apache Flink includes engine updates, connector updates, performance optimization, and bug fixes.

VVR 8.0.1 is officially released to provide an enterprise-class engine based on Apache Flink 1.17.1. VVR 8.0.1 includes new features, performance improvements, and bug fixes in Apache Flink 1.17.1. For example, the Generic Incremental Checkpoint (GIC) feature introduced in Apache Flink 1.17 improves the speed and stability of the checkpointing process. The stability of unaligned checkpoints (UCs) when backpressure occurs is improved. UCs can be used in production environments. The batch processing performance is also significantly improved.

The core architecture of the enterprise-level state backend is restructured, which provides the following benefits: (1) Performance boost: The state format, file storage system, and data cleaning strategy are optimized. This significantly reduces the pressure on local disk space and improves state access speed. As a result, the average performance of large-state deployments increases by more than 40% and the state size decreases by approximately 30%. (2) Stability enhancement: The state scaling and recovery mechanism is improved. In scenarios in which the state size is large, such as 100 GB, the interruption time caused by deployment updates can be reduced from minutes to seconds. By default, the restructured enterprise-level state backend is used in VVR 8.0.1. No additional configuration is required.

In addition to engine upgrade and state backend restructuring, this release includes the following features that enhance the interoperability between Realtime Compute for Apache Flink and other Alibaba Cloud storage and computing services, enhance connector capabilities, and improve system performance and stability:

  • The MongoDB Change Data Capture (CDC) connector is in public preview. You can use the connector to efficiently capture real-time incremental data and read historical data from a MongoDB database that uses the replica set or sharded cluster architecture and synchronize the data to downstream systems. The connector supports the incremental snapshot reading feature. When the connector runs, it initiates the full scan phase to read historical data in parallel. After the full scan is complete, the connector automatically switches to the incremental capture phase to read the changelog stream. This facilitates integration with downstream Flink SQL applications. The connector ensures exactly-once processing to prevent data duplication or loss. The connector also provides multiple startup modes to meet your business requirements.

  • If a deployment uses the CREATE DATABASE AS statement and new tables are created in the source database, state data can be used to restart the deployment. This prevents additional management costs caused by the restart.

  • The OceanBase connector is introduced. You can use the connector to create sink tables and dimension tables. OceanBase is a distributed relational hybrid transactional/analytical processing (HTAP) database developed by Alibaba Group and Ant Group. OceanBase provides various benefits, including strong consistency, high availability, high performance, online scalability, high compatibility with SQL standards and mainstream relational databases, and low costs.

  • More Tair data types are supported to enhance enterprise-class capabilities. For example, you can use TairTs in Flink deployments to create time series datasets in real time, TairVector to create vector datasets for AI applications, TairCpc to build real-time fraud detection applications, and TairRoaring to build real-time customer profiling systems.

  • Tables in Simple Log Service (SLS) catalogs can be used as sink tables. You can define and write data to SLS in the same manner that you use permanent tables.

  • Updates in the 0.5-snapshot version of Apache Paimon (invitational preview) are supported. Column type changes in the source table captured by Flink CDC can be applied to Apache Paimon tables.

The version upgrade is rolled out across the network by using a canary release strategy during a two-week period. After the upgrade is complete for your region and account, you can use the new engine version for your deployments. For more information, see Upgrade the engine version of deployments. We look forward to your feedback.

Features

Feature

Description

References

MongoDB CDC connector (public preview)

The MongoDB CDC connector can be used to create source tables to read incremental data from a MongoDB database.

MongoDB CDC connector (public preview)

Synchronization of new tables by using the CREATE DATABASE AS statement

If a table is added to the source database after a deployment that uses the CREATE DATABASE AS statement is started, the deployment can be restarted based on a snapshot. This way, data from the new table can be captured and synchronized.

CREATE DATABASE AS statement

CREATE TABLE AS between BEGIN STATEMENT SET; and END;

If you add a CREATE TABLE AS statement between the BEGIN STATEMENT SET; and END; statements in a deployment, the deployment can be restarted based on a snapshot. This way, data from new tables can be captured and synchronized. This improves flexibility and eliminates the need to increase the number of deployments.

CREATE TABLE AS statement

Separate time-to-live (TTL) configuration for each stream in a regular join

The two streams in a regular join may need different TTL values in specific use cases. For example, the TTL for one stream may be set to 15 days and the TTL for the other stream to 1 day. This feature allows you to separately configure the TTL for each stream, which improves deployment stability and reduces operating costs.

Optimize Flink SQL

Introduction of the OceanBase connector

The OceanBase connector can be used to create sink tables and dimension tables.

OceanBase connector (public preview)

Support for query pushdown in the SLS connector

Data can be filtered at the source end. This improves read efficiency.

Simple Log Service connector

Support for sink tables in SLS catalogs

SLS catalogs can be used to write data to SLS.

Manage Simple Log Service catalogs

Support for AnalyticDB for PostgreSQL V7.0

The AnalyticDB for PostgreSQL connector can be used to read data from and write data to an AnalyticDB for PostgreSQL V7.0 instance.

AnalyticDB for PostgreSQL connector

Support for more Tair data types

Data types, such as TairTs, TairCpc, TairRoaring, TairVector, and TairGis, are supported.

Tair connector

Support for a new Apache Paimon version and synchronization of column type changes to Apache Paimon

Apache Paimon 0.5-snapshot is supported. If you use Flink CDC to write data to Apache Paimon, column type changes in the source table can be applied to Apache Paimon tables.

Fixed issues

  • The "Filtering update table metadata event: Event{header=EventHeaderV4" error message appears when the MySQL connector is used to read a PolarDB for MySQL database.

  • No output is generated when a window table-valued function (TVF) is used with conditions.