Realtime Compute for Apache Flink release notes (May 29, 2024) - Realtime Compute for Apache Flink

This document describes the major feature updates and bug fixes for the version of Realtime Compute for Apache Flink that was released on May 29, 2024.

Important

This upgrade will be rolled out in stages across all regions using a canary release strategy. For the specific upgrade schedule, see the latest announcement on the right side of the Realtime Compute for Apache Flink console. If the new features are unavailable, the upgrade has not yet reached your account. If you require an expedited upgrade, submit a ticket and we will make arrangements based on your needs.

Overview

On May 29, 2024, a new version of Realtime Compute for Apache Flink was officially released. This release includes platform upgrades, engine updates, connector updates, performance optimizations, and bug fixes.

Platform updates

The platform updates in this release focus on enhancing system stability, operational capabilities, and usability.

You can now convert a namespace from a single availability zone to cross-zone. This eliminates the need to create a new namespace and migrate deployments, significantly simplifying how you enable cross-zone high availability.
You can now configure the state time-to-live (TTL) for individual operators in the expert mode of resource configuration. This allows for more granular control over the state TTL of different operators to achieve higher stability with fewer resources.
The Realtime Compute for Apache Flink extension for Visual Studio Code is now available. It supports an end-to-end local development workflow for Flink deployments, from development and deployment to go-live. You can also quickly synchronize deployments from your online environment.

Additionally, data lineage and the Deployments page have been further optimized.

Engine updates

This release officially introduces Ververica Runtime (VVR) 8.0.7, an enterprise-grade Flink engine based on Apache Flink 1.17.2. Key changes include:

Real-time lakehouse: The Apache Paimon connector SDK is upgraded to support the data lake format of Apache Paimon 0.9.
SQL enhancements: You can now use state time-to-live (TTL) hints to set individual TTLs for regular join and group aggregation operators, providing more precise control over state size. Named parameter support for user-defined functions (UDFs) improves development efficiency and reduces maintenance costs.
Connectors: The MongoDB connector is now Generally Available (GA) and ready for production use. It provides full capabilities for Change Data Capture (CDC) source tables, dimension tables, and result tables. This release also includes significant enhancements for the MySQL CDC and Redis connectors:
- MySQL CDC:
  - The op_type virtual column is now supported to retrieve the data operation type (+I, +/-U, -D) of a change record. This enables you to design business logic and data cleanup strategies based on the specific operation type.
  - Read performance is optimized for MySQL tables with primary keys of the DECIMAL type. In addition, processing for SourceRecord (data change records) in large tables is now parallelized to improve efficiency.
  - The source reuse feature is introduced. When enabled, Flink attempts to merge MySQL CDC source tables within the same deployment that share identical configurations (except for database name, table name, and server-id). This reduces the connection and listening load on the MySQL server.
  - When the sink.ignore-null-when-update parameter is enabled, buffered execution improves processing performance severalfold.
- Redis: When Redis is used for dimension tables and result tables where the key's data type is HashMap, multiple DDL formats for non-primary keys are now supported for better readability. You can also set key prefixes and delimiters to meet data governance requirements.
Metadata management: Because a MySQL view is a logical structure that does not support data read/write operations, view information is no longer displayed for newly created MySQL Catalogs to prevent data errors.
Security: Compatibility for Hive clusters with Kerberos enabled is extended to Hadoop 2.x versions. Additionally, sensitive information such as connector configurations is now masked in logs.

For more details on the main features in this version and their related documentation, refer to the table below. The upgrade will be rolled out in stages. Once the upgrade is complete for your account, we encourage you to upgrade your deployment's engine to this version. For instructions, see Upgrade the engine version for a deployment. We look forward to your feedback.

Key features

Feature	Description	References
Cross-zone high availability enhancements	You can now switch a namespace between single-availability-zone and cross-availability-zone types.	Cross-zone high availability
Data lineage enhancements	Field-level data lineage now supports searching by field name. When multiple results are found, you can use the Up and Down arrow keys to switch between them, making it easier to locate and view field lineage information.	View data lineage by searching for a node or field name
Creator column added to the Deployments page	On the Deployments page, you can click the icon on the right to customize the columns and add the Creator column. This column helps you accurately filter deployments, quickly identify creators when issues arise, and improve collaboration. In the Creator column, you can filter by Created by me to view all deployments that you created in the current namespace. In the search box above the deployment list, you can select Creator, enter a username, and find all deployments created by that user.	N/A
Permission management enhancements	By default, the identity (such as an Alibaba Cloud account, RAM user, or RAM role) that creates a workspace is granted the Owner role for all namespaces within it.	Authorize in the development console
State compatibility check enhancements for stateful start of SQL deployments	When you start a deployment from the latest state, the Flink system detects any changes. If changes are detected, we recommend that you click Click to detect next to State Compatibility to check for compatibility and decide on your next action based on the results.	Start a deployment
VS Code extension for local development	The new extension provides an end-to-end local development workflow for Flink deployments. It helps you easily develop, deploy, and launch SQL, JAR, and Python deployments locally. You can also quickly synchronize deployments from the online environment.	VS Code extension for local development
Operator-level state TTL	In scenarios where only some operators require a long state time-to-live (TTL), setting a single TTL for the entire deployment can lead to state bloat and wasted resources. You can now use either of the following methods to set operator-level TTLs for more precise control over state size and to save resources on deployments with large states: Configure it in the expert mode of resource configuration on the Deployments page. Note Supported only in Ververica Runtime (VVR) 8.0.7 and later. Supported only for SQL deployments when using the expert (fine-grained) resource configuration mode. Use state time-to-live (TTL) hints to set TTLs for regular join and group aggregation operators.	Configure operator parallelism, chaining, and TTL Hints Regular JOIN statement
Named parameter support for UDFs	Improves development efficiency and reduces maintenance costs.	Overview
MySQL CDC connector enhancements	The `op_type` virtual column is supported to retrieve data operation types. Read performance is optimized for MySQL tables with primary keys of the `DECIMAL` type. Parallel processing of `SourceRecord` for large tables is also implemented. CDC source reuse is now supported. Buffered execution is performed when the `sink.ignore-null-when-update` parameter is enabled.	MySQL
Redis connector enhancements	When Redis is used as a dimension or result table where the key's data type is `HashMap`, multiple DDL formats for non-primary keys are supported for better readability. You can now set key prefixes and delimiters to meet data governance requirements.	ApsaraDB for Tair
Buffered reading for ApsaraMQ for RocketMQ	This feature improves processing efficiency and reduces resource costs.	ApsaraMQ for RocketMQ
Views no longer supported in MySQL Catalogs	Because a MySQL view is a logical structure and does not store data, view information is no longer displayed for newly created MySQL Catalogs.	Manage MySQL Catalogs
Enhanced support for Kerberized Hive clusters	Compatibility for Hive clusters with Kerberos enabled is extended to Hadoop 2.x.	Register a Kerberized Hive cluster
Iceberg connector SDK upgraded	Supports reading and writing Apache Iceberg 1.5.	Iceberg Manage DLF-Legacy Catalogs

Fixed issues

Fixed a data correctness issue caused by WHERE clause pushdown in the Hologres connector in Ververica Runtime (VVR) versions 8.0.5 and 8.0.6.
Fixed a data loss issue in the Simple Log Service (SLS) connector that occurred during a failover because the SLS source table continued to commit data at the consumer offset.
Fixed an issue where ValueState lost its state when used alongside a MapState that had a configured TTL, while the ValueState itself did not.
Fixed inconsistent deserialization results for WithinType.PREVIOUS_AND_CURRENT in dynamic complex event processing (CEP).
Fixed a discrepancy in the currentEmitEventTimeLag metric between the console's monitoring page and the Flink UI.
Fixed all issues from Apache Flink 1.17.2. For more details, see the Apache Flink 1.17.2 Release Announcement.