All Products
Search
Document Center

Realtime Compute for Apache Flink:Configure resources for a deployment

Last Updated:Sep 12, 2024

You can configure resources for a deployment before you start the deployment. You can also modify the resource configurations of a deployment after you publish the draft for the deployment. Realtime Compute for Apache Flink supports two resource configuration modes: basic mode (coarse-grained) and expert mode (fine-grained). This topic describes how to configure deployment resources and the parameters that you can configure in the two resource configuration modes.

Usage notes

After you configure resources for a deployment, you must restart the deployment to make the configuration take effect.

Procedure

  1. Go to the page on which you can configure resources for a deployment.

    1. Log on to the Realtime Compute for Apache Flink console.

    2. Find the workspace that you want to manage and click Console in the Actions column.

    3. On the O&M > Deployments page, find the deployment that you want to manage and click its name.

    4. On the Configuration tab, click Edit in the upper-right corner of the Resources section.

  2. Modify resource parameters.

    Two resource configuration modes are supported: basic mode (coarse-grained) and expert mode (fine-grained). The following table describes the resource configuration modes.

    Resource configuration mode

    Description

    Parameter configuration

    Basic mode

    The basic mode (coarse-grained) is a static resource allocation method. In this mode, you need to only specify the total amount of resources that are required to start each TaskManager. The resources include CPU cores and Java Virtual Machine (JVM) memory. Realtime Compute for Apache Flink evenly allocates the resources based on the number of slots of each TaskManager. The number of slots is specified by the flink conf taskmanager.numberOfTaskSlots parameter. For most simple deployments, the basic mode can meet your business requirements.

    image

    Basic mode (coarse-grained)

    Expert mode

    The expert mode (fine-grained) is a dynamic resource allocation method. In this mode, you can configure the resources that are required by each slot sharing group (SSG). This way, Realtime Compute for Apache Flink calculates the resources that are required by each slot, and then dynamically applies resources for slots that are used for TaskManagers from the available resource pool. For complicated deployments, the basic mode may cause low resource utilization. Therefore, you need to use the expert mode to configure the resources for each operator. This way, resource utilization is improved, and the deployment throughput can be achieved based on your business requirements.

    image

    Note

    Only SQL deployments can be configured in expert mode.

    Expert mode (fine-grained)

    For more information about TaskManagers, JobManagers, tasks, and slots, see Flink Architecture.

  3. In the upper-right corner of the Resources section, click Save.

  4. Restart the deployment.

    After resources are configured for the deployment, you must restart the deployment to make the configuration take effect.

Basic mode (coarse-grained)

Parameter

Description

Parallelism

The global parallelism of the deployment.

JobManager CPU

The best practices of Realtime Compute for Apache Flink show that the JobManager requires at least 0.5 CPU cores and 2 GiB of memory to ensure the stable operation of the deployment. We recommend that you configure 1 CPU core and 4 GiB of memory for each TaskManager. You can configure a maximum of 16 CPU cores.

JobManager Memory

Unit: GiB. Minimum value: 2 GiB. Maximum value: 64 GiB.

TaskManager CPU

The best practices of Realtime Compute for Apache Flink show that a TaskManager requires at least 0.5 CPU cores and 2 GiB of memory to ensure the stable operation of the deployment. We recommend that you configure 1 CPU core and 4 GiB of memory for each TaskManager. You can configure a maximum of 16 CPU cores.

TaskManager Memory

Unit: GiB. Minimum value: 2 GiB. Maximum value: 64 GiB.

TaskManager Slots

The number of slots for each TaskManager.

You can use the following formulas when you configure resources for a deployment:

  • Number of CUs to be configured for a deployment = MAX(Total number of CPU cores of the JobManager and TaskManagers, Total memory size of the JobManager and TaskManagers/4).

  • Number of IP addresses required for each deployment = Number of JobManagers + Actual number of TaskManagers. Only one JobManager exists in each deployment.

  • Actual number of TaskManagers = MAX(Total number of CPU cores/Default maximum number of CPU cores of each TaskManager, Total memory size/Default maximum memory size of each TaskManager).

    • Total number of CPU cores = Value of Parallelism/Value of TaskManager Slots × Value of Task Manager CPU.

    • Total memory size = Value of Parallelism/Value of TaskManager Slots × Value of Task Manager Memory.

    • The default maximum number of CPU cores of each TaskManager is 16.

    • The default maximum memory size of each TaskManager is 64 GiB.

  • Actual number of slots that can be allocated to each TaskManager = ⌈Value of Parallelism/Actual number of TaskManagers⌉.

For example, you can set Parallelism to 80, TaskManager Slots to 20, Task Manager CPU to 22, and Task Manager Memory to 30 GiB. The following figure shows the configurations for this example.

image

You can find that the actual number of TaskManagers is 6 and the actual number of slots for each TaskManager is 14 in the development console of Realtime Compute for Apache Flink.

image

The following fomulas are used to calculate the actual number of TaskManagers and the actual number of slots for each TaskManager:

  1. Actual number of TaskManagers = MAX(⌈Total number of CPU cores/Default maximum number of CPU cores of each TaskManager⌉, ⌈Total memory size/Default maximum memory size of each TaskManager⌉) = MAX(⌈Value of Parallelism/Value of TaskManager Slots × Value of Task Manager CPU/16⌉, ⌈Value of Parallelism/Value of TaskManager Slots × Value of Task Manager Memory/64⌉) = MAX(⌈80/20 × 22/16⌉, ⌈80/20 × 30/64⌉) = MAX(⌈88/16⌉, ⌈120/64⌉) = MAX(6, 2) = 6

  2. Actual number of slots for each TaskManager = ⌈Value of Parallelism/Actual number of TaskManagers⌉ = ⌈80/6⌉ = 14

Note
  • The formula for calculating the actual number of TaskManagers can be used only if the values of Task Manager CPU and Task Manager Memory are greater than the default maximum values.

  • The calculated ratios are rounded up to the nearest integers.

  • If you want to specify higher default maximum values for the memory size and CPU cores of each TaskManager, submit a ticket.

  • You can also configure the numberOfTaskSlots parameter in the Other Configuration field of the Parameters section on the Configuration tab of the Deployments page. This parameter works in the same manner as the TaskManager Slots parameter in the Resources section, but has a higher priority.

Expert mode (fine-grained)

Note
  • Only SQL deployments can be configured in expert mode.

  • If you modify the SQL code or resource configurations of an SQL deployment after the SQL deployment is deployed, you must regenerate a resource configuration plan to ensure that the SQL deployment can be started ax expected.

Configure basic resources

Parameter

Description

JobManager CPU

The best practices of Realtime Compute for Apache Flink show that the JobManager of a deployment requires at least 0.25 CPU cores and 1 GiB of memory to ensure the stable operation of the deployment. You can configure a maximum of 16 CPU cores.

JobManager Memory

Unit: GiB. Example: 4 GiB. Minimum value: 1 GiB. Maximum value: 64 GiB.

TaskManager Slots

N/A.

Configure slot resources

  1. In the expert mode, click Get Plan Now in the Resources section and configure a resource configuration plan.

    image

  2. In the upper-right corner of the SLOT box, click the 编辑 icon.SLOT

  3. Modify the slot configurations.修改slot信息

    The Parallelism parameter that you configure in the Modify SLOT(default) dialog box is the parallelism of all operators in the SSG. After you configure the Parallelism parameter, the system automatically performs the following operations:

    • The system automatically applies the parallelism to all operators in the SSG.

    • The system automatically calculates the memory size that is required by the state backend, Python, and operators based on the computational logic of the deployment. Manual configurations are not required.

    • Note
      • Make sure that the number of partitions can be divided by the deployment parallelism for source nodes without leaving a remainder. For example, if a Kafka cluster has 16 partitions, we recommend that you set the Parallelism parameter to 16, 8, or 4 to prevent data skew. We also recommend that you do not set the Parallelism parameter for a source node to an excessively small value. If the Parallelism parameter is set to an excessively small value, the source node may read excessive data. In this scenario, a data input bottleneck may occur and the deployment throughput may be reduced.

      • We recommend that you configure the deployment parallelism for all nodes except source nodes based on the amount of data. If the amount of data on a node is large, we recommend that you set the Parallelism parameter to a large value. If the amount of data on a node is small, we recommend that you set the Parallelism parameter to a small value.

      • We recommend that you adjust the size of heap memory and off-heap memory only when an exception occurs or the deployment throughput is low. For example, an out of memory (OOM) error or a severe garbage collection issue occurs in your deployment. If you adjust the size of heap memory and off-heap memory when your deployment runs normally, the deployment throughput cannot be significantly increased.

  4. Click OK.

Configure operator resources

By default, all operators are placed in one SSG. You cannot separately modify the resource configuration of each operator. If you want to configure resources for individual operators, you must enable the Multiple SSG mode to ensure that each operator has an independent slot. This way, you can configure resources for each operator in a slot. To configure resources for an operator, perform the following steps:

  1. In the upper-right corner of the Resources section of the Configuration tab, click Edit, and set Mode to Expert.

  2. (Optional) If no resource plans are available, click Get Plan Now.

    image

  3. Turn on Multiple SSG and click Re-fetch.

    Each operator in an SSG is assigned its own slot.

    image

  4. Click the 编辑 icon in the SLOT box of the desired operator to modify the resource configuration of the operator.

    image

  5. Click OK.

Configure the parallelism, chaining strategy, and TTL of an operator

Note

Only Realtime Compute for Apache Flink that uses Ververica Runtime (VVR) 8.0.7 or later supports the configuration of the time to live (TTL) for an operator.

You can configure the parallelism, chaining strategy, and TTL for a single operator.

  1. Click the image icon to display vertex details in the VERTEX box.

    image

    Note

    You can click the 编辑 icon in the VERTEX box and configure the Parallelism parameter for all operators in the vertex at a time.

  2. Click the image icon of an operator.

    image

  3. Configure operator resources.

    image

    The following table describes the parameters.

    Parameter

    Description

    Parallelism

    The parallelism of the operator.

    Chaining Strategy

    A chain is a logical computing chain that is formed by multiple operators. A chain can help improve the execution efficiency and performance of deployments and reduce the overhead of data transmission and serialization between operators. In specific scenarios, you may need to disconnect the chain to control the execution flow of deployments and improve deployment performance in an efficient manner. The following chaining strategies are supported:

    • ALWAYS: The operator can always be chained together with the upstream and downstream operators. This is the default value.

    • HEAD: The operator serves as the head node of a chain. The upstream operators are disconnected from the chain. The downstream operators are chained with the current operator.

    • NEVER: The operator cannot be chained with the upstream and downstream operators.

    State Expiration Time Settings

    You can configure the time-to-live (TTL) in seconds, minutes, hours, or days. The default value of this parameter is the expiration time of the deployment. If you do not specify an expiration time for a deployment, 1.5 days is automatically used. For more information about how to specify an expiration time for a deployment, see the "Parameters section" of the Configure a deployment topic.

    Note
    • Only Realtime Compute for Apache Flink that uses VVR 8.0.7 or later supports this parameter.

    • Only stateful operators support the TTL configuration.

  4. Click OK.

References

  • For more information about how to optimize resource configurations, see Optimize Flink SQL.

  • If you do not want to manually reconfigure resources, you can enable automatic tuning to allow the system to automatically reconfigure resources. For more information, see Configure automatic tuning.

  • You can modify the deployment configuration in the Basic, Parameters, and Logging sections of the Configuration tab. For more information, see Configure a deployment.

  • You can use the intelligent deployment diagnostics feature provided by Flink Advisor to monitor the health status of a deployment. For more information, see Perform intelligent deployment diagnostics.