Release notes (September 19, 2022) - Realtime Compute for Apache Flink

This topic describes the major updates and bug fixes of the Realtime Compute for Apache Flink version released on September 19, 2022.

Overview

This release ships two new Ververica Runtime (VVR) versions: VVR 4.0.15 for Apache Flink 1.13 and VVR 6.0.2 for Apache Flink 1.15.

VVR 6.0.2 is the first enterprise-grade Flink engine built on Apache Flink 1.15, bringing upstream improvements to window table-valued functions, CAST functions, type systems, and JSON functions to the cloud platform.

State management is now centralized. Checkpoints and savepoints are managed independently in a status set, decoupled from deployment lifecycle. Savepoints are no longer deleted when you cancel a deployment. The native savepoint format significantly improves creation speed, restoration speed, and reduces storage overhead. Object Storage Service (OSS) storage costs drop by 15–40% per year through status set management. You can also start a deployment from a savepoint that belongs to a different deployment, which simplifies A/B testing and dual-run validation.

Scheduled tuning lets you define time-based resource policies for deployments with predictable peak and off-peak hours, reducing manual intervention and labor costs.

This version also supports quick task restart, which provides a fast recovery capability in case of deployment failover. This improves business continuity. If you are tolerant of duplicate copies and loss of data and have high requirements for business continuity, you can configure quick task restart to quickly recover the failed tasks. The delay caused by deployment failover can be reduced from minutes to as little as milliseconds.

Warning

This feature cannot prevent duplicate copies and loss of data in this version. Therefore, make sure that your business is tolerant of loss and duplicate copies of data before you use the feature. Quick task restart is disabled by default. To enable this feature for a deployment, you must add additional configuration items. For more information about the principles and configuration details, see Configure quick task restart.

Health score introduces a diagnostic scoring model for deployments in any state. The feature runs expert rules against your deployment and surfaces actionable suggestions.

Flink complex event processing (CEP) has been verified in production and is now available to all users. The hot update feature lets you update CEP rules during peak hours without restarting the deployment, eliminating the 10-minute task rerelease interruption that risk-control workloads previously experienced. CEP SQL syntax is also enhanced: the new MATCH_RECOGNIZE extensions let you express complex patterns in SQL instead of DataStream API code, and new metrics (patternMatchedTimes, patternMatchingAvgTime) give you visibility into pattern-matching behavior.

Data integration: A new API supported on the platform side is available to integrate business.

Performance: Dual-stream Join deployments see an average 40%+ performance improvement through automatic key-value separation inference in GeminiStateBackend. Deployment startup speed improves by an average of 15%.

Connector and catalog additions: Hive Catalog now supports Hive 2.1.0–2.3.9 and Hive 3.1.0–3.1.3. The built-in Java Database Connectivity (JDBC) connector supports source, dimension, and sink tables. Tablestore incremental log reading, AnalyticDB for MySQL catalog, and database synchronization to Kafka are also included in this release.

New features

Feature	Description	Documentation
Status set management	Status set management decouples state management from deployment start and stop operations for all stateful Flink deployments. Savepoints are no longer deleted when a deployment is stopped. You can use a dedicated management page to create and delete savepoints on a schedule.	Status set management Start an SQL deployment Start a JAR deployment Start a Python deployment Cancel a deployment
Scheduled tuning	For Flink deployments with predictable traffic peaks and valleys, you can define custom scheduling policies. At the specified times, the deployment's resources are automatically adjusted to a preset size to handle traffic fluctuations, eliminating the need for manual scaling.	Configure automatic tuning Configure a deployment
Health score	The health score feature applies expert rules to detect issues during deployment startup and execution, providing actionable recommendations. This feature helps you better understand the status of your deployments and adjust parameters accordingly.	Perform intelligent deployment diagnostics
Improved member authorization	The authorization process is improved: instead of manually entering user information, you can now select from a list of all RAM users when granting permissions.	Grant permissions on namespaces
Dynamic complex event processing (CEP)	CEP provides pattern matching capabilities for real-time data streams. This release builds on open source Flink CEP by allowing you to externalize deployment rules in a database so they can be loaded dynamically. This is exposed through the DataStream API.	JSON format for rules in dynamic CEP Quick start for dynamic CEP in Flink
Enhancement of CEP SQL	The MATCH_RECOGNIZE statement allows you to describe CEP rules using SQL. This release enhances the open source Flink MATCH_RECOGNIZE statement with new capabilities, such as outputting timed-out matches and supporting `notFollowedBy`. In addition, new metrics have been introduced: `patternMatchedTimes`: The number of times a pattern was successfully matched. `patternMatchingAvgTime`: The average time taken for pattern matching.	CEP statements
Support for database synchronization to Kafka	When you use this feature, data is synchronized to a corresponding Upsert Kafka table. You can use the table in Kafka directly instead of the MySQL table, which reduces the load on the MySQL service from multiple deployments.	Synchronize data from all tables in a MySQL database to Kafka by using Flink CDC Manage Kafka JSON catalogs
Define partitioned tables in Hologres result tables with DDL	You can use PARTITION BY to define a partitioned table when you create a Hologres result table.	CREATE TABLE AS statement
Set timeout for asynchronous requests in Hologres dimension tables	By setting the `asyncTimeoutMs` parameter for asynchronous requests, you can ensure that the application completes data requests within a specific time frame.	Hologres dimension table
Set table properties when creating tables with Hologres Catalog	Setting appropriate table properties can help the system organize and query data efficiently. When you use Hologres Catalog to create a table, you can now set physical table properties in the WITH clause.	Manage Hologres catalogs
MaxCompute sink connector supports the Binary type	The Binary data type is now supported. MaxCompute limits the length of this type to 8 MB. The MaxCompute Stream Tunnel Sink feature is added. The flush efficiency of the MaxCompute sink is optimized.	MaxCompute result table
Hive Catalog supports more Hive versions	This version supports Hive 2.1.0-2.3.9 and 3.1.0-3.1.3.	Manage Hive catalogs
Tablestore source connector released	Supports reading incremental logs from Tablestore.	Tablestore source table
JDBC connector released	The community JDBC connector is now built-in.	JDBC source table JDBC result table JDBC dimension table
Parallelism of a Message Queue for Apache RocketMQ source table can exceed the topic partition count	This mode allows you to pre-allocate resources for potential increases in topic partitions before consumption begins.	Message Queue for Apache RocketMQ source table
Set Message Key for Message Queue for Apache RocketMQ result tables	You can now set the message key when writing to Message Queue for Apache RocketMQ.	Message Queue for Apache RocketMQ result table
Support for AnalyticDB for MySQL Catalog	With this catalog, you can directly read metadata from AnalyticDB for MySQL without manually registering AnalyticDB for MySQL tables. This improves development efficiency and ensures data correctness.	Manage AnalyticDB for MySQL catalogs

Performance optimization

Native savepoint format is introduced to address timeout issues that affected canonical-format savepoints in large-state jobs. All savepoint operations benefit from the native format:

Metric	Improvement
Savepoint completion time	An average improvement of 5 to 10 times, with the ratio increasing as the incremental state size decreases. In some typical deployments, the improvement can be up to 100 times.
Deployment recovery time	An average improvement of about 5 times, with the ratio increasing as the state size grows.
Savepoint space overhead	An average space overhead reduction of 2 times, with the ratio increasing as the state size grows.
Savepoint network overhead	An average network overhead reduction of 5 to 10 times, with the ratio increasing as the incremental state size decreases.

Dual-stream Join optimization: JOIN operators in SQL streaming deployments now automatically infer whether to enable key-value separation in GeminiStateBackend based on deployment characteristics. In typical scenario benchmarks, the average performance improvement exceeds 40%. For more information, see Optimize Flink SQL and GeminiStateBackend configurations.

Deployment startup speed improves by an average of 15%.

Bug fixes

The following issues are fixed:

The modification time of a deployment was abnormally updated.
The state could not be determined after specific deployments were suspended and restarted.
JAR packages could not be uploaded locally from Alibaba Finance Cloud.
The total number of resources configured for running a deployment was inconsistent with that on the Statistics page.
Users could not log on to the Logs page.
An error occurred when accessing Upsert Kafka tables via the Kafka catalog.
A NullPointerException was returned when intermediate results were used in nested operations of multiple user-defined functions (UDFs).
In MySQL CDC, abnormal chunks and out-of-memory (OOM) errors occurred, and the time zone of initialization data was inconsistent with that of incremental data. For more information, see Create a MySQL CDC source table.