Fully managed Flink provides the expert mode for you to configure resources. In this configuration mode, you can perform fine-grained control over the resources required by the tasks of jobs to run jobs in different scenarios. This topic describes how to configure resources in expert mode.
Background information
- Basic: the resource configuration mode provided by Apache Flink. In this mode, you can configure the memory size and the number of CPU cores of JobManager and TaskManagers and the job parallelism.
- Expert (BETA): a new resource configuration mode that is provided by fully managed Flink.
In this configuration mode, you can control the resources used by jobs in a fine-grained
manner to meet your business requirements for high job throughput.
The system automatically runs jobs on native Kubernetes based on your resource configurations. The system also automatically determines the specifications and number of TaskManagers based on slot specifications and job parallelism.
- Auto (BETA): an automatic configuration mode that is based on the expert mode. In this
configuration mode, jobs use the resource configurations that are configured in expert
mode. Autopilot is also enabled.
In auto configuration mode, you do not need to configure related resources. When you run a job, Autopilot automatically generates resource configurations for the job and adjusts the resource configurations based on the status of the job. This optimizes resource utilization of the job without affecting the health of the job. For more information about Autopilot, see Configure Autopilot.
Limits
- Only SQL jobs support the expert mode.
- Only fully managed Flink that runs Ververica Platform (VVP) 2.5.4 or later allows you to configure the resources required by the tasks of jobs in expert mode.
Resource configuration recommendations
- Make sure that the number of partitions can be divided by the job parallelism for source nodes without leaving a remainder. For example, if a Kafka cluster has 16 partitions, we recommend that you set the parallelism to 16, 8, or 4 to prevent data skew. We also recommend that you do not set the parallelism for a source node to a small value. If the parallelism is set to a small value, the source node may read excessive data. In this scenario, a data input bottleneck may occur and the job throughput may be reduced.
- We recommend that you configure the job parallelism for all nodes except source nodes based on the amount of data. If the amount of data on a node is large, we recommend that you set the parallelism to a large value. If the amount of data on a node is small, we recommend that you set the parallelism to a small value.
- Adjust the size of heap memory and off-heap memory only when an exception occurs or the job throughput is low. For example, an out of memory (OOM) error or a severe garbage collection issue occurs in your job. If you adjust the size of heap memory and off-heap memory when your job runs normally, the job throughput cannot be significantly increased.