This topic describes the tuning parameters that you can configure when you use GeminiStateBackend.

Background information

In most scenarios, GeminiStateBackend can automatically adjust parameter configurations without the need for manual configurations based on the adaptive parameter tuning feature. In specific scenarios, you can configure parameters to further optimize the performance. This topic describes specific parameter configurations in the following scenarios:
  • If you want to coordinate memory resources and performance, configure the basic parameters. For more information, see Basic parameters.
  • If your local disk space is insufficient, configure the parameters for storage and computing separation. For more information, see Parameters for compute-storage separation.
  • If a JOIN operator has a performance bottleneck, configure the parameters for key-value separation. For more information, see Parameters for key-value separation.

Basic parameters

Parameter Description Data type Default value Remarks Supported version
state.backend.gemini.memory.managed Specifies whether GeminiStateBackend automatically allocates memory based on the managed memory. BOOLEAN true Valid values:
  • true: The system automatically calculates the memory size of each backend based on the managed memory and the number of task slots. This is the default value.
  • false: The memory size of each backend is set to the sum of the values of the state.backend.gemini.total.writebuffer.size and state.backend.gemini.offheap.size parameters.

    You can set this parameter to false and specify the total size of memory that is occupied by WriteBuffer and the off-heap memory to coordinate memory resources and performance.

Note We recommend that you use the default value.
Ververica Runtime (VVR) 4.0 and later
state.backend.gemini.total.writebuffer.size The total size of memory that is occupied by WriteBuffer. STRING 128 MB This parameter takes effect when the state.backend.gemini.memory.managed parameter is set to false. Otherwise, the total size of memory that is occupied by WriteBuffer is automatically calculated based on the managed memory.
When you configure this parameter, you must add a unit to the value of this parameter. The unit can be B, KB, MB, or GB.
Note
  • The unit is not case-sensitive.
  • A space is required between the value and the unit.
state.backend.gemini.offheap.size The size of the off-heap memory that is used by GeminiStateBackend.
Note The off-heap memory used by GeminiStateBackend does not include the memory that is occupied by WriteBuffer.
STRING (none) This parameter takes effect when the state.backend.gemini.memory.managed parameter is set to false. Otherwise, the size of off-heap memory used by GeminiStateBackend is automatically calculated based on the managed memory.
When you configure this parameter, you must add a unit to the value of this parameter. The unit can be B, KB, MB, or GB.
Note
  • By default, this parameter is not configured.
  • The unit is not case-sensitive.
  • A space is required between the value and the unit.
Note The basic configurations of checkpoints and state backends in Apache Flink also apply to GeminiStateBackend. For more information, see Checkpoints and State Backends.

Parameters for compute-storage separation

Parameter Description Data type Default value Remarks Supported version
state.backend.gemini.file.cache.type The compute-storage separation mode. STRING INFINITE Valid values:
  • INFINITE: Compute-storage separation is disabled. The state data is stored only on local disks. This is the default value.
  • LIMITED: The state data is preferentially stored in local disks. If the local disk space is insufficient, the state data is stored in a distributed file system (DFS).
    If the local disk space is insufficient due to a large amount of state data, you can set this parameter to LIMITED and configure the state.backend.gemini.file.cache.preserved-space parameter based on the limit on local disks.
    Note
    • The values of this parameter are case-sensitive.
    • If you enable compute-storage separation, job performance may be affected. Therefore, consider the space and performance factors before you enable this feature. We recommend that you use the default value.
VVR 4.0.11 and later
state.backend.gemini.file.cache.preserved-space The disk space that is available for the state data on a TaskManager. STRING 2 GB If the actual available disk space is less than the value of this parameter, GeminiStateBackend stores the state data in a DFS to eliminate the limit on the local storage.
When you configure this parameter, you must add a unit to the value of this parameter. The unit can be B, KB, MB, or GB.
Note
  • The unit is not case-sensitive.
  • A space is required between the value and the unit.
Note The Object Storage Service (OSS) Client SDK writes data from a file to local disks before the OSS Client SDK uploads the file. Therefore, if OSS is used as a DFS, unexpected usage of disk space may occur. When Flink creates a savepoint, a single state backend generates only one file. As a result, a large uncompressed file is generated and the file occupies the disk space. In this scenario, the compute-storage separation feature fails. To resolve this issue, you must increase the parallelism to reduce the size of the state data on a single node.

Parameters for key-value separation

Parameter Description Data type Default value Remarks Supported version
state.backend.gemini.kv.separate.mode The key-value separation mode. STRING DISABLE Valid values:
  • DISABLE: Key-value separation is disabled. This is the default value.
  • GLOBAL_ENABLE: Key-value separation is enabled.
    If a job of a join operator generates backpressure or requires performance tuning and the success rate of JOIN operations is low but the value in a key-value pair is large, you can set this parameter to GLOBAL_ENABLE to improve the performance. You can view the values of Records Received and Records Sent for join nodes to estimate the success rate of JOIN operations.
    Note
    • The values of this parameter are case-sensitive.
    • If you configure this parameter, extra overhead is generated on the jobs that do not meet the preceding conditions. Therefore, we recommend that you use the default value of this parameter.
VVR 4.0.12 and later
state.backend.gemini.kv.separate.value.size.threshold The value size threshold that triggers key-value separation after key-value separation is enabled. INTEGER 200 The key and value of the record whose value reaches this threshold are separately stored. The recommended value ranges from 150 to 1000. You can adjust the value of this parameter based on the success rate of JOIN operations. If the success rate of JOIN operations is high, you can set this parameter to a large value.

Unit: bytes.

Parameters for adaptive parameter tuning

Parameter Description Data type Default value Remarks Supported version
state.backend.gemini.auto-tune.mode The adaptive parameter tuning mode. STRING ACTIVE Valid values:
  • DISABLED: Adaptive parameter tuning is disabled.
  • MONITORING: GeminiStateBackend continuously monitors the job status and provides parameter tuning suggestions in logs. GeminiStateBackend does not automatically adjust the parameter configurations.
  • ACTIVE: GeminiStateBackend continuously monitors the job status and automatically adjusts the configurations of the GeminiStateBackend-related parameters that are not configured in the flink-conf.yaml file. This is the default value.
  • FORCEFUL: GeminiStateBackend continuously monitors the job status and automatically adjusts the configurations of parameters, including the GeminiStateBackend-related parameters that are configured in the flink-conf.yaml file.
Note
  • The values of this parameter are not case-sensitive.
  • We recommend that you use the default value.
VVR 4.0.12 and later
state.backend.gemini.auto-tune.burst.start.x The time period during which the performance-first mode is used when adaptive parameter tuning is enabled. STRING (none) The letter x in the names of the parameters can be replaced by a number. start.x corresponds to end.x. You can configure the two parameters to specify multiple time periods. The values of the parameters are in the yyyy-MM-dd HH:mm:ss format.
If you have higher requirements on the transactions per second (TPS) than performance, you can configure the two parameters. GeminiStateBackend uses the TPS-first policy during the period that is specified by these parameters to achieve higher TPS. However, more resources are consumed if you configure the two parameters. The resources indicate CPU cores and memory.
Note
  • By default, these parameters are not configured.
  • The time specified by the state.backend.gemini.auto-tune.burst.end.x parameter must be later than the time specified by the state.backend.gemini.auto-tune.burst.start.x parameter.
  • If you configure these parameters, more resources are consumed. Therefore, we recommend that you do not configure these parameters.
state.backend.gemini.auto-tune.burst.end.x