This topic describes the major updates and bug fixes of the Realtime Compute for Apache Flink version released on September 19, 2022.
Overview
A new version of Realtime Compute for Apache Flink was officially released on September 19, 2022. This version provides performance optimization, bug fixes, and major updates on the platform, engine, and connectors. The Ververica Runtime (VVR) versions that have been released are VVR 4.0.15 for Apache Flink 1.13 and VVR 6.0.2 for Apache Flink 1.15. Below is an overview of the latest versions:
VVR 6.0.2 was officially released. This version is the first release of an enterprise-level Flink engine based on Apache Flink 1.15. Apache Flink 1.15 provides feature and performance improvements, such as enhancements of window table-valued functions, CAST functions, type systems, and JSON functions. These features and performance improvements are implemented on the cloud.
Status management is a key concern for users. This version allows you to centrally manage the checkpoints and savepoints of a deployment in a status set. Specifically, the generation speed, size, and restoration speed of savepoints are significantly improved, and the success rate and stability are also greatly improved.
The setting that the savepoints are deleted when you cancel a deployment is discontinued. The checkpoints and savepoints are clearly distinguished. You can explicitly create and manage savepoints. In this version, the optimization of the GeminiDB engine also reduces costs. The Object Storage Service (OSS) storage cost is reduced by 15 to 40% per year by using status set management. Moreover, the new version allows you to start and restore a deployment from a specified savepoint of another deployment. You can more conveniently perform dual-run tests such as A/B testing.
Resource utilization has been improved. The resource tuning feature supports the scheduled tuning policy that can automatically adjust the resources of a deployment at a specific time based on your settings. If the service has clear peak or off-peak hours, you can use this policy to save labor costs.
The deployment diagnostics feature introduces the concept of health score in the new version. The feature provides the diagnostic items and suggestions for a deployment in all states, and assists you in operating the real-time compute deployments.
In terms of data integration, you can use a new API supported on the platform side to integrate business.
Real-time risk management is one of the main application scenarios of Flink. The new feature related to complex event processing (CEP) aims at continuous event behaviors. The feature was provided for the whitelist users in the preview version and has been verified in the production of the new version.
This version introduces a series of enhancements of Flink CEP for all users. First, the hot update feature of CEP rules is the most anticipated feature. This capability can actively intervene in the update rules during peak hours at the earliest opportunity and solve the problem of the risk control business being interrupted for 10 minutes due to the task rerelease. The business availability is greatly improved. Second, the CEP SQL syntax is enhanced. This release improves the expression capability of CEP SQL by introducing a new SQL extension syntax. You can simplify complex DataStream deployments into SQL deployments due to the enhancement of SQL syntax. The feature improves development efficiency and can be integrated into the lineage system of data governance. Finally, this version introduces more metrics to describe the rules in Flink CEP.
In terms of performance optimization, JOIN operators that are used to join two data streams in SQL streaming deployments allow the Flink engine to automatically infer whether to enable the key-value separation feature. The performance of deployments that involve dual-stream Join is significantly improved. The Hive versions supported by Hive catalog are expanded to Hive 2.1.0-2.3.9 and Hive 3.1.0-3.1.3. Users can read Connectors from Tablestore, Java Database Connectivity (JDBC) source tables, JDBC dimension tables, and result tables.
New features
Feature | Description | References |
Status set management | Status set management applies to all stateful jobs. The status management is disassociated from the start and stop of a deployment. The savepoints are no longer deleted when you cancel a deployment. Users can create and delete savepoints on a unique management page at the scheduled time. | |
Scheduled tuning | The scheduled tuning feature applies to the Flink deployments that have clear peak or off-peak hours in the business. Users can set up custom timing policies for deployments in the console. Then, the deployment resources are automatically adjusted to the preset sizes at a specific time. This feature can help solve the issue of data fluctuations and save labor costs. | |
Health score | The health score feature applies to all deployments that are being started or running. The feature detects the issues in deployments and provides suggestions by using various expert rules. This feature helps users better understand the deployment status and adjust parameters. | |
Process optimization on granting permissions to an account | The process of granting permissions to an account is optimized. All RAM users are listed for you to select from when you are granted permissions. You do not need to manually enter the information of RAM users. | |
Flink CEP | CEP is a capability for matching patterns over real-time data streams. This version is based on Flink CEP. The CEP rules can be placed outside the database so that they can be dynamically loaded and take effect. The API is the DataStream API. | |
Enhancement of CEP SQL | The MATCH_RECOGNIZE statement allows you to use SQL statements to describe CEP rules. This version is based on the MATCH_RECOGNIZE statement of Flink CEP and provides enhanced capabilities, such as output of matching events that do not arrive within a specified time interval and relaxed non-contiguity by using notFollowedBy(). Other new metrics introduced include the following:
| |
Database synchronization that supports synchronizing data to Kafka | Data is written to the Upsert Kafka tables corresponding to those in the MySQL database. Users can use the Kafka sink tables instead of MySQL sink tables to reduce the load on the MySQL database caused by multiple deployments. | |
A DDL statement that defines a partitioned table in a Hologres result table | PARTITION BY is supported when you create a Hologres sink table. | |
Timeout period for performing an asynchronous request in a Hologres dimension table | You can specify the asyncTimeoutMs parameter to ensure that the data request can be performed within a specific period. | |
You can configure table attributes when you create a Hologres table. | You can use suitable table attribute settings efficiently sort and query data. When you create a Hologres table, you can configure physical table properties in the WITH clause. | |
MaxCompute sink connectors support the Binary data type. |
| |
Hive Catalog supports more Hive versions. | This version supports Hive 2.1.0-2.3.9 and Hive 3.1.0-3.1.3. | |
Tablestore connector | You can read the incremental logs in Tablestore. | |
JDBC connector | The JDBC connector is built in. | |
The parallelism for a Message Queue for Apache RocketMQ source table can be greater than the number of partitions that are defined in a Message Queue for Apache RocketMQ message topic. | Users can reserve resources for the possible number of partitions in a topic before consumption. | |
The message keys of a Message Queue for Apache RocketMQ result table can be specified. | You can specify the key of a Message Queue for Apache RocketMQ message. | |
AnalyticDB for MySQL catalog | You can read metadata from AnalyticDB for MySQL by using the catalog. You no longer need to manually register AnalyticDB for MySQL tables. This improves the efficiency of deployment development and ensures that the data is correct. |
Performance optimization
The native format for savepoints is introduced. This helps optimize the problem of savepoints in the canonical format easily timing out in jobs with large states, and makes deployments more stable. The following table describes the advantages of the native format:
Section
Benefits
Time required to create a savepoint
The average efficiency is increased by 500-1,000% and improves with the decrease of the incremental state. The efficiency can even increase by 10,000% in a typical deployment.
Time required to restore a deployment
The average efficiency is increased by about 500% and improves with the increase of the state size.
Space overhead of a savepoint
The average space overload is reduced by 200% and the saving ratio improves with the increase of the state size.
Network overhead of a savepoint
The average network overhead is reduced by 500% to 1,000% and the saving ratio improves with the decrease of the incremental state.
JOIN operators that are used to join two data streams in SQL streaming deployments allow the Flink engine to automatically infer whether to enable the key-value separation feature. JOIN operators that are used to join two data streams in SQL streaming deployments can automatically infer whether to enable the key-value separation feature based on the deployment characteristics and improve the dual-stream join performance. In the performance test of typical scenarios, the average performance improvement is over 40%. For more information, see Optimize Flink SQL and Configurations of GeminiStateBackend.
The deployment startup speed is increased by an average of 15%.
Bug fixes
The following issues have been fixed:
The modification time of a deployment was abnormally updated.
The state could not be determined after specific deployments were suspended and restarted.
The JAR package could not be uploaded locally from Alibaba Finance Cloud.
The total number of resources configured for running a deployment was inconsistent with that on the Statistics page.
You could not log on to the Logs page.
An error was reported when you tried to access Upsert Kafka tables via the Kafka catalog.
The NullPointerException error was returned when the intermediate results were used for nested operations of multiple user-defined functions (UDFs).
In the MySQL-CDC, abnormal chunks and out-of-memory (OOM) occured, and the time zone of initialization data was inconsistent with that of incremental data. For more information about how to configure resource parameters, see Create a MySQL CDC source table.