Fully managed Flink provides a feature that checks state data compatibility and the state data migration feature. This topic describes how fully managed Flink checks the compatibility between the state data that you select and an SQL deployment. This topic also describes the differences in migration efficiency and deployment performance during state data migration between RocksDB and Gemini state backends.

Background information

Fully managed Flink saves the intermediate calculation results of an SQL deployment in the state data. The state data includes checkpoints and savepoints. Flink SQL is suitable for various stream processing scenarios. You may need to constantly modify an SQL deployment to meet your development iteration or business development requirements. After you modify an SQL deployment and restart the deployment by using the original state data, the deployment may become incompatible with the state data.

Fully managed Flink whose engine version is vvr-4.0.11-flink-1.13 or later provides a feature that checks state data compatibility and the state data migration feature. These features help you maximize the reuse of the original state data and quickly update your SQL deployment. If you choose to start your deployment based on the state data after you modify and publish the draft, the system checks the compatibility between the state data that you select and the deployment. For more information about the check results, see Compatibility.

To adapt the state data to a new deployment, you must migrate the state data. Fully managed Flink supports two state backends: RocksDB and Gemini. For more information about the differences in migration efficiency and deployment performance between RocksDB and Gemini state backends, see State data migration.

Compatibility

The system checks the compatibility between the state data that you select and an SQL deployment. The following check results are provided:
  • Fully compatible
    The current deployment is fully compatible with the state data. In this case, the calculation results of the deployment that runs based on the state data are the same as the calculation results of the deployment that runs based on historical data. To ensure that the deployment is fully compatible with the state data, you may need to migrate the state data. For more information, see State data migration. Fully compatible in Ververica Platform (VVP) 3.0
  • Partially compatible

    The current deployment is partially compatible with the state data. In this case, the calculation results of only specific columns of the deployment that runs based on the state data are the same as the calculation results of the deployment that runs based on historical data. The remaining columns do not have data in the state data, which leads to partial inconsistency of the calculation results. To ensure that the deployment is partially compatible with the state data, you may need to migrate the state data. For more information, see State data migration.

  • Not compatible
    The current deployment is incompatible with the state data.
    Warning If you start the deployment based on the state data, the deployment may fail to start at a high probability or the running result of the deployment may not meet your expectations. Proceed with caution.
  • Restart the deployment to determine whether the deployment is compatible with the state data
    Warning If you start the deployment based on the state data, the deployment may fail to start or the running result of the deployment may not meet your expectations. Proceed with caution.

State data migration

RocksDB and Gemini have the following differences in migration efficiency and deployment performance:
  • RocksDB
    The RocksDB state backend migrates all state data when a deployment is started. When the deployment is started, the deployment is in the RUNNING state. However, the operator whose data needs to be migrated is in the INITIALIZING state and no data is consumed or processed. After all data of the operator is migrated, the operator changes to the RUNNING state and starts to consume data. RocksDB
    Note In this example, an aggregate function in the deployment is modified. Therefore, the RocksDB state backend migrates all state data when the deployment starts. In the preceding figure, the GroupAggregate operator is in the INITIALIZING state. In this case, the operator cannot process data.
  • Gemini
    The RocksDB state backend migrates all state data when a deployment is started. Compared with RocksDB, the Gemini state backend migrates data based on your business requirements when a deployment is running. Therefore, the Gemini state backend migrates a specific data record in the state data only when the data record is accessed. After the deployment is started, the deployment is in the RUNNING state. The operator whose state data needs to be migrated quickly changes from the INITIALIZING state to the RUNNING state and starts to consume data. During state data migration, the transactions per second (TPS) value gradually returns to the normal level, which indicates that all state data is migrated. The Gemini state backend consumes less time to migrate state data than the RocksDB state backend. Gemini
    Note If the Gemini and RocksDB state backends are separately used to migrate the state data of the same SQL deployment, the Gemini state backend provides higher migration efficiency. The Gemini state backend can quickly change the state of the operator whose data needs to be migrated from INITIALIZING to RUNNING and allow the operator to process data. The INITIALIZING state indicates that the operator is loading data.