All Products
Search
Document Center

Realtime Compute for Apache Flink:August 21, 2023

Last Updated:Apr 23, 2024

This topic describes the release notes for Realtime Compute for Apache Flink and provides links to relevant references. The release notes provide the major updates and bug fixes in Realtime Compute for Apache Flink in the version that was released on August 21, 2023.

Important

A canary release is initiated for this version and will be complete within two weeks. If you cannot use new features in Realtime Compute for Apache Flink, the new version is still unavailable for your account. If you want to perform an upgrade at the earliest opportunity, submit a ticket to apply for an upgrade. For more information about the upgrade plan, see the most recent announcement on the right side of the homepage of the Realtime Compute for Apache Flink console.

Overview

A new official version of Realtime Compute for Apache Flink was released on August 21, 2023. This version includes engine updates, connector updates, performance optimization, and bug fixes.

The engine version Ververica Runtime (VVR) 8.0.1 is released, which is a new enterprise-level Flink engine based on Apache Flink 1.17.1. This version of Realtime Compute for Apache Flink includes all new features, performance improvements, and bug fixes in Apache Flink 1.17.1. The Generic Incremental Checkpoint (GIC) feature in Apache Flink 1.17 improves the speed and stability of the checkpoint procedure. In Apache Flink 1.17, the stability of unaligned checkpoints (UCs) during the handling of deployment backpressure is improved and UCs are available for production. The batch processing performance is also significantly improved.

In this version of Realtime Compute for Apache Flink, the core architecture of the enterprise-level state backend is restructured, which greatly improves performance and stability in the following aspects: (1) The enterprise-level state backend uses a more compact state format and file storage and a more efficient data cleansing strategy. This significantly reduces the local state storage space and improves access performance. The average performance of the deployments that have state bottlenecks can be improved by more than 40%, and the state size can be reduced by about 30%. (2) The state scaling and recovery mechanism is further improved. In scenarios where large state data such as 100 GB of data exists, the interruption time of deployment updates can be reduced from minutes to seconds. By default, the new version of the enterprise-level state backend is used in VVR 8.0.1. You do not need to change configurations.

Besides the upgrade of the engine kernel and state backend, this version enhances the interoperability between Flink and various storage and computing services of Alibaba Cloud, provides more connector features, and improves system performance and stability. This version supports the following new and enhanced features:

  • The MongoDB Change Data Capture (CDC) connector is officially released and is in public preview. The MongoDB CDC connector can be used to efficiently capture real-time incremental data and historical data from a MongoDB database that uses the replica set architecture or sharded cluster architecture and synchronize the data to downstream systems. The MongoDB CDC connector supports the incremental snapshot algorithm. The connector supports parallel reading of large amounts of historical data during full reading and can automatically switch from full reading to incremental reading. The connector uses the exactly-once semantics for data synchronization to ensure that no data is lost and no duplicate data exists. In the incremental reading phase, the connector can scan complete change event streams to provide the data to downstream Flink SQL computing deployments. The MongoDB CDC connector provides multiple start offset modes to meet your business requirements.

  • The CREATE DATABASE AS statement allows you to restart a deployment that uses state data and continue to run the deployment after a change operation such as the addition of a table. This prevents the loss of state data and additional deployment management costs due to the restart of a deployment.

  • The OceanBase connector is supported and can be used for result tables and dimension tables. OceanBase is a native distributed relational hybrid transactional/analytical processing (HTAP) database that is fully developed by Alibaba Group and Ant Group. OceanBase provides strong data consistency, high availability, high performance, online scalability, high compatibility with SQL standards and mainstream relational databases, and low costs.

  • More enterprise-level Tair data structures are supported. You can use Flink and TairTs to build time series datasets in real time, use Flink and TairVector to build AI vector datasets, use Flink and TairCpc to build real-time fraud detection applications, or use Flink and TairRoaring to implement a real-time customer profiling system.

  • The tables that are provided by Simple Log Service catalogs can be used as result tables. You can save tables as permanent tables to define and write data to Simple Log Service tables.

  • Apache Paimon is in invitational preview. This version includes an update of Apache Paimon 0.5-snapshot and supports column type changes to the source table when the Flink CDC connector writes data to Apache Paimon.

The canary release will be complete within two weeks on the entire network. After the canary release is complete, the platform capabilities are upgraded and you can view the new engine version in the Engine Version drop-down list of your deployment. You can upgrade the engine that is used by your deployment to the new version. For more information, see Upgrade the engine version of deployments. We look forward to your feedback.

Features

Feature

Description

References

Public preview of the MongoDB CDC connector

The MongoDB CDC connector can be used for source tables to read MongoDB incremental data.

MongoDB CDC connector (public preview)

Synchronization of data from new tables in the source database by using the CREATE DATABASE AS statement

If a table is added to the source database after a deployment that uses the CREATE DATABASE AS statement is started, the deployment can be restarted from its snapshot to capture the new table. This way, data from the new table can also be synchronized.

CREATE DATABASE AS statement

Addition of the CREATE TABLE AS statement between BEGIN STATEMENT SET; and END;

If you add the CREATE TABLE AS statement between BEGIN STATEMENT SET; and END; to the code of a deployment, the deployment can be restarted based on savepoints. This way, the table that is created by using the added CREATE TABLE AS statement can be captured for data synchronization. This improves the ease of use of the CREATE TABLE AS statement and prevents an increase in the number of deployments.

CREATE TABLE AS statement

Separate configuration of the time-to-live (TTL) values for the states of each stream in a regular join

For business scenarios that use a regular join, different TTL values may be required for the two streams in the regular join. In most cases, the TTL for one stream may be set to 15 days but the TTL for the other stream may be set to only 1 day. This meets business requirements. The separate configuration of the TTL for different streams can improve the stability of deployments and effectively reduce operating costs.

Optimize Flink SQL

OceanBase connector

The OceanBase connector can be used for result tables and dimension tables.

OceanBase connector (public preview)

Query pushdown supported by the Simple Log Service connector

The Simple Log Service connector can be used to filter data at the source end. This improves read efficiency.

Simple Log Service connector

Tables of Simple Log Service catalogs used as result tables

Simple Log Service catalogs can be used to write data to Simple Log Service.

Manage Simple Log Service catalogs

AnalyticDB for PostgreSQL V7.0 supported by the AnalyticDB for PostgreSQL connector

The AnalyticDB for PostgreSQL connector supports AnalyticDB for PostgreSQL V7.0.

AnalyticDB for PostgreSQL connector

Writing of more data types by using the Tair connector

The Tair connector supports more data types, such as TairTs, TairCpc, TairRoaring, TairVector, and TairGis.

Tair connector

Upgrade of Apache Paimon and support for column type changes to the source table during data writing to Apache Paimon by using the Flink CDC connector

Apache Paimon is upgraded to the 0.5-snapshot. When you use the Flink CDC connector to write data to Apache Paimon, Apache Paimon supports the column type changes to the source table.

Fixed issues

  • The following issue is fixed: The error message "Filtering update table metadata event: Event{header=EventHeaderV4" appears when the MySQL source connector is used to read PolarDB for MySQL data.

  • The following issue is fixed: No output is generated when a window table-valued function (TVF) with conditions is used.