Flink state data compatibility - Realtime Compute for Apache Flink

Realtime Compute for Apache Flink provides a feature that checks state data compatibility and the state data migration feature. This topic describes how Realtime Compute for Apache Flink checks the compatibility between the state data that you select and an SQL deployment. This topic also describes the differences between RocksDB and Gemini state backends in terms of migration efficiency and deployment performance during state data migration.

Background information

Realtime Compute for Apache Flink saves the intermediate calculation results of an SQL deployment in the state data. The state data includes checkpoints and savepoints. Flink SQL is suitable for various stream processing scenarios. You may need to constantly modify an SQL deployment to meet your development iteration or business development requirements. After you modify an SQL deployment and restart the deployment by using the original state data, the deployment may become incompatible with the state data.

Realtime Compute for Apache Flink whose engine version is vvr-4.0.11-flink-1.13 or later provides a feature that checks state data compatibility and the state data migration feature. These features help you maximize the reuse of the original state data and quickly update your SQL deployment. If you start your deployment based on state data after you modify and publish the draft of the deployment, the system checks the compatibility between the state data that you select and the deployment. For more information, see Compatibility.

To adapt the state data to a new deployment, you must migrate the state data. Realtime Compute for Apache Flink supports two state backends: RocksDB and Gemini. For more information about the differences between RocksDB and Gemini state backends in terms of migration efficiency and deployment performance, see State data migration.

Compatibility

If you select Resume Mode in the Start Job panel of an SQL deployment, Realtime Compute for Apache Flink automatically detects the changes to the SQL statements, runtime parameter configurations, and engine version of the deployment. If a deployment change is detected, we recommend that you click Click to detect next to State Compatibility to check the compatibility between the deployment and the state data. Then, determine the subsequent actions based on the compatibility check result.

Important

If you modify an SQL deployment, you must check the compatibility between the deployment and the state data before you select Resume Mode for the deployment. If all conditions are met and the deployment is compatible with the state data, the deployment resumes.

The following check results and suggestions are provided:

Fully compatible
The current deployment is fully compatible with the most recent state data. In this case, the calculation results of the deployment that runs based on the most recent state data are the same as the calculation results of the deployment that runs based on historical data. We recommend that you start the deployment.
Partially compatible
The current deployment is partially compatible with the most recent state data. In this case, the calculation results of only specific columns of the deployment that runs based on the state data are the same as the calculation results of the deployment that runs based on historical data. The remaining columns do not have data in the state data, which leads to partial inconsistency of the calculation results. To ensure that the deployment is fully compatible with the state data, we recommend that you start the deployment based on other states or without states.
Incompatible
Warning
If you start the deployment based on the state data, the deployment may fail to start at a high probability or the running result of the deployment may not meet your expectations. Proceed with caution. We recommend that you start the deployment based on other states or without states.
Compatibility to be determined after a restart (compatibility unknown)
Warning
If you start the deployment based on the state data, the deployment may fail to start or the running result of the deployment may not meet your expectations. Proceed with caution.

State data migration

RocksDB and Gemini have the following differences in migration efficiency and deployment performance:

RocksDB
The RocksDB state backend migrates all state data when a deployment is started. When the deployment is started, the deployment is in the RUNNING state. However, the operator whose data needs to be migrated is in the INITIALIZING state and no data is consumed or processed. After all data of the operator is migrated, the operator enters the RUNNING state and starts to consume data.
Note
In this example, an aggregate function in the deployment is modified. Therefore, the RocksDB state backend migrates all state data when the deployment starts. In the preceding figure, the GroupAggregate operator is in the INITIALIZING state. In this case, the operator cannot process data.
Gemini
The RocksDB state backend migrates all state data when a deployment is started. Compared with RocksDB, the Gemini state backend migrates data based on your business requirements when a deployment is running. Therefore, the Gemini state backend migrates a specific data record in the state data only when the data record is accessed. After the deployment is started, the deployment is in the RUNNING state. The operator whose state data needs to be migrated quickly changes from the INITIALIZING state to the RUNNING state and starts to consume data. During state data migration, the transactions per second (TPS) value gradually returns to the normal level, which indicates that all state data is migrated. The Gemini state backend consumes less time to migrate state data than the RocksDB state backend.
Note
If the Gemini and RocksDB state backends are separately used to migrate the state data of the same SQL deployment, the Gemini state backend provides higher migration efficiency. The Gemini state backend can quickly change the state of the operator whose data needs to be migrated from INITIALIZING to RUNNING and allow the operator to process data. The INITIALIZING state indicates that the operator is loading data.