All Products
Search
Document Center

Realtime Compute for Apache Flink:Configure resources for a deployment

Last Updated:Apr 08, 2024

You can configure resources and modify the resource configuration for a deployment before you start the deployment or after you publish a draft. This topic describes how to configure resources and modify the resource configuration of a deployment in basic or expert mode.

Limits

Only SQL deployments can be configured in expert mode.

Precautions

After you configure resources for a deployment, you must restart the deployment to make the configuration take effect.

Procedure

  1. Go to the page on which you can configure resources for a deployment.

    1. Log on to the Realtime Compute for Apache Flink console.

    2. Find the workspace that you want to manage and click Console in the Actions column.

    3. On the Deployments page, click the name of the desired deployment.

    4. On the Configuration tab, click Edit in the upper-right corner of the Resources section.

  2. Modify resource parameters.

    For more information about TaskManagers, JobManagers, tasks, and slots, see Apache Flink Architecture.

    Basic mode (coarse-grained)

    The basic mode is a static resource allocation method. In this mode, you need to only specify the total amount of resources that are required to start each TaskManager. The resources include CPU cores and Java Virtual Machine (JVM) memory. The system evenly allocates the resources based on the number of slots of each TaskManager. The number of slots is specified by the flink conf taskmanager.numberOfTaskSlots parameter. For most simple deployments, the basic mode can meet your business requirements.

    Parameter

    Description

    Parallelism

    The global parallelism of the deployment.

    Job Manager CPU

    The best practices of Realtime Compute for Apache Flink show that the JobManager requires at least 0.25 CPU cores and 1 GiB of memory to ensure the stable operation of the deployment. We recommend that you configure 1 CPU core and 4 GiB of memory for the JobManager.

    Job Manager Memory

    Unit: GiB. Minimum value: 1 GiB.

    Task Manager CPU

    The best practices of Realtime Compute for Apache Flink show that a TaskManager requires at least 0.25 CPU cores and 1 GiB of memory to ensure the stable operation of the deployment. We recommend that you configure 1 CPU core and 4 GiB of memory for each TaskManager.

    Task Manager Memory

    Unit: GiB. Minimum value: 1 GiB.

    TaskManager Slots

    The number of slots for each TaskManager.

    The following formulas are used when you configure resources for a deployment:

    • Number of CUs to be configured for a deployment = MAX(Total number of CPU cores of the JobManager and TaskManagers, Total memory size of the JobManager and TaskManagers/4).

    • Number of IP addresses required for each deployment = Number of JobManagers + Actual number of TaskManagers. Only one JobManager exists in each deployment.

    • Actual number of TaskManagers = MAX(Total number of CPU cores/Default maximum number of CPU cores of each TaskManager, Total memory size/Default maximum memory size of each TaskManager).

      • Total number of CPU cores = Value of Parallelism/Value of TaskManager Slots × Value of Task Manager CPU.

      • Total memory size = Value of Parallelism/Value of TaskManager Slots × Value of Task Manager Memory.

      • The default maximum number of CPU cores of each TaskManager is 16.

      • The default maximum memory size of each TaskManager is 64 GiB.

    • Actual number of slots that can be allocated to each TaskManager = ⌈Value of Parallelism/Actual number of TaskManagers⌉.

    For example, you can set Parallelism to 80, TaskManager Slots to 20, Task Manager CPU to 22, and Task Manager Memory to 30 GiB. The following figure shows the configurations for this example.

    image

    You can find that the actual number of TaskManagers is 6 and the actual number of slots for each TaskManager is 14 in the development console of Realtime Compute for Apache Flink.

    image

    The following fomulas are used to calculate the actual number of TaskManagers and the actual number of slots for each TaskManager:

    1. Actual number of TaskManagers = MAX(⌈Total number of CPU cores/Default maximum number of CPU cores of each TaskManager⌉, ⌈Total memory size/Default maximum memory size of each TaskManager⌉) = MAX(⌈Value of Parallelism/Value of TaskManager Slots × Value of Task Manager CPU/16⌉, ⌈Value of Parallelism/Value of TaskManager Slots × Value of Task Manager Memory/64⌉) = MAX(⌈80/20 × 22/16⌉, ⌈80/20 × 30/64⌉) = MAX(⌈88/16⌉, ⌈120/64⌉) = MAX(6, 2) = 6

    2. Actual number of slots for each TaskManager = ⌈Value of Parallelism/Actual number of TaskManagers⌉ = ⌈80/6⌉ = 14

    Note
    • The formula for calculating the actual number of TaskManagers can be used only if the values of Task Manager CPU and Task Manager Memory are greater than the default maximum values.

    • The calculated ratios are rounded up to the nearest integers.

    • If you want to specify higher default maximum values for the memory size and CPU cores of each TaskManager, submit a ticket.

    • You can also configure the numberOfTaskSlots parameter in the Other Configuration field of the Parameters section on the Configuration tab of the Deployments page. This parameter works in the same manner as the TaskManager Slots parameter in the Resources section, but has a higher priority.

    Expert mode (fine-grained)

    Note

    Only SQL deployments can be configured in expert mode.

    In expert mode, resources are dynamically allocated. In this mode, you can configure the resources that are required by each slot sharing group. This way, Realtime Compute for Apache Flink calculates the resources that are required by each slot, and then dynamically applies resources for slots that are used for TaskManagers from the available resource pool. For complicated deployments, the basic mode may cause low resource utilization. Therefore, you need to use the expert mode to configure the resources for each operator. This way, resource utilization is improved, and the deployment throughput can be achieved based on your business requirements.

    Configure basic parameters

    Parameter

    Description

    Job Manager CPU

    The best practices of Realtime Compute for Apache Flink show that the JobManager requires at least 0.25 CPU cores and 1 GiB of memory to ensure the stable operation of the deployment.

    Job Manager Memory

    Unit: GiB. Example: 4 GiB. Minimum value: 1 GiB.

    TaskManager Slots

    The number of slots for each TaskManager.

    Configure slot resources

    1. Click Edit in the upper-right corner of the Resources section. Then, set the Mode parameter to Expert.

    2. Click Get Plan Now in the resource plan section.

    3. In the upper-right corner of the SLOT box, click the 编辑 icon.SLOT

    4. Modify the slot configurations.修改slot信息

      The Parallelism parameter that you configure in the Modify SLOT(default) dialog box is the parallelism for all operators in the slot sharing group. After you configure the Parallelism parameter, the system automatically performs the following operations:

      • The system automatically applies the parallelism to all operators in the slot sharing group.

      • The system automatically calculates the memory size that is required by the state backend, Python, and operators based on the computational logic of the deployment. Manual configurations are not required.

      • Note
        • Make sure that the number of partitions can be divided by the deployment parallelism for source nodes without leaving a remainder. For example, if a Kafka cluster has 16 partitions, we recommend that you set the Parallelism parameter to 16, 8, or 4 to prevent data skew. We also recommend that you do not set the Parallelism parameter for a source node to an excessively small value. If the Parallelism parameter is set to an excessively small value, the source node may read excessive data. In this scenario, a data input bottleneck may occur and the deployment throughput may be reduced.

        • We recommend that you configure the deployment parallelism for all nodes except source nodes based on the amount of data. If the amount of data on a node is large, we recommend that you set the Parallelism parameter to a large value. If the amount of data on a node is small, we recommend that you set the Parallelism parameter to a small value.

        • We recommend that you adjust the size of heap memory and off-heap memory only when an exception occurs or the deployment throughput is low. For example, an out of memory (OOM) error or a severe garbage collection issue occurs in your deployment. If you adjust the size of heap memory and off-heap memory when your deployment runs normally, the deployment throughput cannot be significantly increased.

    5. Click OK.

    Configure the parallelism for an operator

    1. In the Resources section of the Configuration tab, set Mode to Expert.

    2. Click the 编辑 icon in the VERTEX box and configure Parallelism for the operator.

    3. Click OK.

    Configure the operator chaining strategy

    A chain is a logical computing chain that is formed by multiple operators. A chain can help improve the execution efficiency and performance of deployments and reduce the overhead of data transmission and serialization between operators. In specific scenarios, you may need to disconnect the chain to control the execution flow and performance of deployments in an efficient manner. You can perform the following steps to configure the operator chaining strategy:

    1. In the Resources section of the Configuration tab, set Mode to Expert.

    2. Click the 编辑 icon.

      image.png

    3. Modify the operator chaining strategy.

      Chaining strategy

      Description

      ALWAYS (default value)

      The current operator can always be chained together with the upstream and downstream operators.

      HEAD

      The current operator is used as the head node of the chain. The operator is disconnected only from the upstream operator but is still connected to the downstream operator.

      NEVER

      The current operator is not chained with the upstream and downstream operators.

    4. Click OK.

    Configure resources for an operator

    By default, all operators are placed in one slot sharing group. Therefore, you cannot separately modify the resource configuration for each operator. If you want to configure resources for individual operators, you must configure the related parameters so that each operator has its own independent slot. This way, you can configure resources for each operator in the related slot. To configure resources for an operator, perform the following steps:

    1. On the Configuration tab of the Deployments page, click Edit in the upper-right corner of the Parameters section and add the following configuration to the Other Configuration field.

      table.exec.split-slot-sharing-group-per-vertex: 'true' 
    2. Restart the deployment.

    3. On the Configuration tab, click Edit in the upper-right corner of the Resources section, set Mode to Expert, and then click Re-fetch.

    4. Click the 编辑 icon in the SLOT box of the desired operator to modify the resource configuration of the operator.

      image.png

    5. Click OK.

  3. In the upper-right corner of the Resources section, click Save.

References

  • For more information about how to optimize resource configurations, see Optimize Flink SQL.

  • If you do not want to manually reconfigure resources, you can enable automatic tuning to allow the system to automatically reconfigure resources. For more information, see Configure automatic tuning.

  • You can modify the deployment configuration in the Basic, Parameters, and Logging sections of the Configuration tab. For more information, see Configure a deployment.

  • You can use the intelligent deployment diagnostics feature provided by Flink Advisor to monitor the health status of a deployment. For more information, see Perform intelligent deployment diagnostics.