All Products
Search
Document Center

Realtime Compute for Apache Flink:Configurations of GeminiStateBackend

Last Updated:Nov 02, 2023

This topic describes the tuning parameters that you can configure when you use the enterprise-level state backend storage GeminiStateBackend.

Background information

In most scenarios, GeminiStateBackend can automatically adjust parameter configurations without the need for manual configurations based on the adaptive parameter tuning feature. You need to only adjust specific basic configurations based on your business scenarios. For more information, see Basic parameters. In specific scenarios, you can configure parameters to further optimize the performance. This topic describes specific parameter configurations in the following scenarios:

Note

For more information about enterprise-level state storage, see GeminiStateBackend. For more information about how to configure an enterprise-level state storage, see Configure the parameters that are related to state backends.

Basic parameters

Parameter

Description

Data type

Default value

Remarks

table.exec.state.ttl

The TTL of state data in SQL deployments.

LONG

  • If the Ververica Runtime (VVR) version is 4.0.12 or later, the default value is 129600000. This value is equal to 1.5 days.

  • If the VVR version is earlier than 4.0.12, this parameter is left empty by default. This indicates that the state data does not expire.

Unit: milliseconds. For example, if you set this parameter to 129600000, the TTL of the state data is 1.5 days. This parameter cannot be used together with the state.backend.gemini.ttl.ms parameter.

Note

We recommend that you set this parameter to a small value based on your business requirements.

state.backend.gemini.ttl.ms

The TTL of state data in DataStream deployments or Python deployments.

LONG

(none)

Unit: milliseconds. For example, if you set this parameter to 129600000, the TTL of the state data is 1.5 days. This parameter cannot be used together with the table.exec.state.ttl parameter.

Note

We recommend that you set this parameter to a small value based on your business requirements.

state.backend.gemini.savepoint.external-sort.local-storage.enabled

Specifies whether the temporary data generated during the savepoint creation is stored on a local disk.

BOOLEAN

false

Valid values:

  • false: Temporary data is stored on a Distributed File System (DFS). This is the default value.

  • true: Temporary data is stored on a local disk to minimize the interactions with the DFS and accelerate savepoint creation. If the storage usage of your local disk is less than 30% and your local disk has sufficient space to store the temporary data, we recommend that you set this parameter to true.

Note
  • Only Realtime Compute for Apache Flink that uses VVR 4.0 or later supports this parameter.

  • If the savepoint is created at a low speed, we recommend that you configure this parameter.

Memory-related parameters

Note

The following table describes the memory-related parameters that can be configured only in VVR 4.0 and later.

Parameter

Description

Data type

Default value

Remarks

state.backend.gemini.memory.managed

Specifies whether GeminiStateBackend automatically allocates memory based on the managed memory.

BOOLEAN

true

Valid values:

  • true: The system automatically calculates the memory size of each backend based on the managed memory and the number of task slots.

  • false: The memory size of each backend is set to the sum of the values of the state.backend.gemini.total.writebuffer.size and state.backend.gemini.offheap.size parameters.

    You can set this parameter to false and specify the total size of memory that is occupied by WriteBuffer and the off-heap memory to coordinate memory resources and performance.

Note
  • We recommend that you use the default value.

  • For Realtime Compute for Apache Flink that uses VVR 4.0.12 or later, the default value of this parameter is true. For Realtime Compute for Apache Flink that uses a VVR version earlier than 4.0.12, the default value of this parameter is false.

state.backend.gemini.total.writebuffer.size

The total size of memory that is occupied by WriteBuffer.

STRING

128 MB

This parameter takes effect when the state.backend.gemini.memory.managed parameter is set to false. Otherwise, the total size of memory that is occupied by WriteBuffer is automatically calculated based on the managed memory.

When you configure this parameter, you must add a unit to the value of this parameter. The unit can be B, KB, MB, or GB.

Note
  • The unit is not case-sensitive.

  • A space is required between the value and the unit.

state.backend.gemini.offheap.size

The size of the off-heap memory that is used by GeminiStateBackend.

Note

The off-heap memory that is used by GeminiStateBackend does not include the memory that is occupied by WriteBuffer.

STRING

(none)

This parameter takes effect when the state.backend.gemini.memory.managed parameter is set to false. Otherwise, the size of the off-heap memory that is used by GeminiStateBackend is automatically calculated based on the managed memory.

When you configure this parameter, you must add a unit to the value of this parameter. The unit can be B, KB, MB, or GB.

Note
  • By default, this parameter is not configured.

  • The unit is not case-sensitive.

  • A space is required between the value and the unit.

Note

The basic configurations of checkpoints and state backends in Apache Flink also apply to GeminiStateBackend. For more information, see Checkpoints and State Backends.

Parameters for compute-storage separation

Note

The following table describes the parameters for compute-storage separation that can be configured only in VVR 4.0.11 and later.

Parameter

Description

Data type

Default value

Remarks

state.backend.gemini.file.cache.type

The compute-storage separation mode.

STRING

  • VVR 4.0.11: INFINITE

  • VVR 4.0.12 and later: LIMITED

Valid values:

  • INFINITE: Compute-storage separation is disabled. The state data is stored only in local disks.

  • LIMITED: The state data is preferentially stored in local disks. If the local disk space is insufficient, the state data is stored in a distributed file system (DFS).

    If the local disk space is insufficient due to a large amount of state data, you can set this parameter to LIMITED and configure the state.backend.gemini.file.cache.preserved-space parameter based on the limit on local disks.

    Note

    The values of this parameter are case-sensitive.

state.backend.gemini.file.cache.preserved-space

The disk space that is available for the state data on a TaskManager.

STRING

2 GB

If the actual available disk space is less than the value of this parameter, GeminiStateBackend stores the state data in a DFS to eliminate the limit on the local storage.

When you configure this parameter, you must add a unit to the value of this parameter. The unit can be B, KB, MB, or GB.

Note
  • The unit is not case-sensitive.

  • A space is required between the value and the unit.

Note

The Object Storage Service (OSS) Client SDK writes data from a file to local disks before the OSS Client SDK uploads the file. Therefore, if OSS is used as a DFS, unexpected usage of disk space may occur. When Flink creates a savepoint, a single state backend generates only one file. As a result, a large uncompressed file is generated and the file occupies the disk space. In this scenario, the compute-storage separation feature fails. To resolve this issue, you must increase the parallelism to reduce the size of the state data on a single node.

Parameters for key-value separation

Note

The following table describes the parameters for key-value separation that can be configured only in VVR 4.0.12 and later.

Parameter

Description

Data type

Default value

Remarks

state.backend.gemini.kv.separate.mode

The key-value separation mode.

STRING

  • VVR 4.0.12 and later 4.0.x versions: DISABLE

  • VVR 6.0.1 and later: SPECIFIED_TABLE_ENABLE

Valid values:

  • DISABLE: Key-value separation is disabled.

  • GLOBAL_ENABLE: Key-value separation is enabled.

  • SPECIFIED_TABLE_ENABLE : The engine automatically enables or disables key-value separation based on the characteristics of deployment operators.

    Note

    This parameter can be set to SPECIFIED_TABLE_ENABLE only in Realtime Compute for Apache Flink that uses VVR 6.0.1 or later.

Note
  • The values of this parameter are case-sensitive.

  • In Realtime Compute for Apache Flink that uses VVR 4.0.X, if the success rate of JOIN operations of deployments is low and the values of the key-value pairs are high, you can set this parameter to GLOBAL_ENABLE to obtain better performance.

    You can estimate the success rate of JOIN operations based on the proportions of Records Received and Records Sent of join nodes.

  • In Realtime Compute for Apache Flink that uses VVR 6.0.1 or later, we recommend that you retain the default value of this parameter. The SQL engine automatically enables key-value separation based on the characteristics of deployment operators.

state.backend.gemini.kv.separate.value.size.threshold

The value size threshold that triggers key-value separation after key-value separation is enabled.

INTEGER

200

The key and value of the record whose value reaches this threshold are separately stored. The recommended value ranges from 150 to 1000. You can adjust the value of this parameter based on the success rate of JOIN operations. If the success rate of JOIN operations is high, you can set this parameter to a large value.

Unit: bytes.

Note

In Realtime Compute for Apache Flink that uses VVR 6.0.1 or later, if you have enabled the adaptive parameter tuning feature, the engine can dynamically adjust the value of this parameter based on the data characteristics. You do not need to explicitly configure this parameter.

Parameters for adaptive parameter tuning

Note

The following table describes the parameters for adaptive parameter tuning that can be configured only in VVR 4.0.12 and later.

Parameter

Description

Data type

Default value

Remarks

state.backend.gemini.auto-tune.mode

The adaptive parameter tuning mode.

STRING

ACTIVE

Valid values:

  • DISABLED: Adaptive parameter tuning is disabled.

  • MONITORING: GeminiStateBackend continuously monitors the deployment status and provides parameter tuning suggestions in logs. GeminiStateBackend does not automatically adjust the parameter configurations.

  • ACTIVE: GeminiStateBackend continuously monitors the deployment status and automatically adjusts the configurations of the GeminiStateBackend-related parameters that are not configured in the flink-conf.yaml file. This is the default value.

  • FORCEFUL: GeminiStateBackend continuously monitors the deployment status and automatically adjusts the configurations of parameters, including the GeminiStateBackend-related parameters that are configured in the flink-conf.yaml file.

Note
  • The values of this parameter are not case-sensitive.

  • We recommend that you use the default value.

state.backend.gemini.auto-tune.burst.start.x

The period of time during which the performance-first mode is used when adaptive parameter tuning is enabled.

STRING

(none)

The letter x in the names of the parameters can be replaced by a number. start.x corresponds to end.x. You can configure the two parameters to specify multiple time periods. The values of the parameters are in the yyyy-MM-dd HH:mm:ss format.

If your requirements for transactions per second (TPS) are higher than your requirements for performance, you can configure the two parameters. GeminiStateBackend uses the TPS-first policy during the period of time that is specified by these parameters to achieve higher TPS. However, more resources are consumed if you configure the two parameters. The resources indicate CPU cores and memory.

Note
  • By default, these parameters are not configured.

  • The time that is specified by the state.backend.gemini.auto-tune.burst.end.x parameter must be later than the time that is specified by the state.backend.gemini.auto-tune.burst.start.x parameter.

  • If you configure these parameters, more resources are consumed. Therefore, we recommend that you do not configure these parameters.

state.backend.gemini.auto-tune.burst.end.x