Fully managed Flink provides the expert mode for you to configure resources. In this configuration mode, you can perform fine-grained control over the resources required by the tasks of jobs to run jobs in different scenarios. This topic describes how to configure resources in expert mode.

Background information

Fully managed Flink supports the following resource configuration modes:
  • Basic: the resource configuration mode provided by Apache Flink. In this mode, you can configure the memory size and the number of CPU cores of JobManager and TaskManagers and the job parallelism.
  • Expert (BETA): a new resource configuration mode that is provided by fully managed Flink. In this configuration mode, you can control the resources used by jobs in a fine-grained manner to meet your business requirements for high job throughput.

    The system automatically runs jobs on native Kubernetes based on your resource configurations. The system also automatically determines the specifications and number of TaskManagers based on slot specifications and job parallelism.

  • Auto (BETA): an automatic configuration mode that is based on the expert mode. In this configuration mode, jobs use the resource configurations that are configured in expert mode. Autopilot is also enabled.

    In auto configuration mode, you do not need to configure related resources. When you run a job, Autopilot automatically generates resource configurations for the job and adjusts the resource configurations based on the status of the job. This optimizes resource utilization of the job without affecting the health of the job. For more information about Autopilot, see Configure Autopilot.

Limits

  • Only SQL jobs support the expert mode.
  • Only fully managed Flink that runs Ververica Platform (VVP) 2.5.4 or later allows you to configure the resources required by the tasks of jobs in expert mode.

Resource configuration recommendations

  • Make sure that the number of partitions can be divided by the job parallelism for source nodes without leaving a remainder. For example, if a Kafka cluster has 16 partitions, we recommend that you set the parallelism to 16, 8, or 4 to prevent data skew. We also recommend that you do not set the parallelism for a source node to a small value. If the parallelism is set to a small value, the source node may read excessive data. In this scenario, a data input bottleneck may occur and the job throughput may be reduced.
  • We recommend that you configure the job parallelism for all nodes except source nodes based on the amount of data. If the amount of data on a node is large, we recommend that you set the parallelism to a large value. If the amount of data on a node is small, we recommend that you set the parallelism to a small value.
  • Adjust the size of heap memory and off-heap memory only when an exception occurs or the job throughput is low. For example, an out of memory (OOM) error or a severe garbage collection issue occurs in your job. If you adjust the size of heap memory and off-heap memory when your job runs normally, the job throughput cannot be significantly increased.

Procedure

  1. Go to the Resources tab.
    1. Log on to the Realtime Compute for Apache Flink console.
    2. On the Fully Managed Flink tab, find the workspace that you want to manage, and click Console in the Actions column.
    3. In the left-side navigation pane, choose Applications > Deployments.
    4. Click the name of the job for which you want to modify the resource configuration.
    5. In the upper-right corner of the job details page, click Configure.
    6. On the right side of the Draft Editor page, click the Resources tab.
    7. In the Resource Configuration pane, set Configuration Mode to Expert.
  2. Configure the resource information of JobManager.
    The following table describes the resource parameters of JobManager.
    Parameter Description
    Job Manager CPUs Default value: 1.
    Job Manager Memory Minimum value: 1 GiB. We recommend that you use GiB or MiB as the unit. For example, you can set this parameter to 1024 MiB or 1.5 GiB.
  3. Configure slot resources.
    1. Click Get Plan Now in the Resource Plan section.
    2. In the upper-right corner of the SLOT box, click the Edit icon icon.
      SLOT
    3. Modify the SLOT configuration information.
      Modify SLOT(default) dialog box
      Note
      • The Parallelism parameter that you configure in the Modify SLOT(default) dialog box specifies the global parallelism for the job. After you configure the Parallelism parameter, the system automatically applies the parallelism to each vertex node. You can also click the Edit icon icon in the upper-right corner of the VERTEX box to configure the parallelism for a specific vertex node based on your business requirements.
      • The system automatically calculates the memory size that is required by the state backend, Python, and operators based on the computational logic of the job. Manual configurations are not required.
      • By default, the system calculates the number of CPU cores required by a slot at a CPU-to-memory ratio of 1:4 based on the total memory size required by a slot. The total memory size of a slot is the sum of the heap memory, off-heap memory, and the memory size that is required by the state backend, Python, and operators.
  4. Modify the resource configuration generation policy.
    1. On the right side of the Draft Editor page, click the Advanced tab and add the following parameters to the Additional Configuration section.
      Parameter Description
      table.exec.split-slot-sharing-group-per-vertex Specifies whether to allocate a separate slot to each vertex node. Default value: false.
      table.exec.slot-sharing-group.prefer-heap-memory The default size of the heap memory that is required by a slot. Default value: 1 GiB.
      table.exec.slot-sharing-group.prefer-off-heap-memory The default size of the off-heap memory that is required by a slot. Default value: 32 MiB.
      table.exec.state-backend.prefer-managed-memory The management memory that is required by a state backend. Default value: 512 MiB.
      table.exec.resource.default-parallelism The initial job parallelism. Default value: 1.
    2. Click Re-fetch Plan.
    3. Click Publish.