Custom resources for real-time tasks in Alibaba Blink refer to an advanced mode for configuring granular resources within real-time computing Flink (BETA). This mode, introduced by fully managed Flink, allows for precise resource allocation to meet job throughput demands. The system automatically executes jobs in Native K8s mode, tailoring the specifications and quantity of Task Managers (TMs) to the Slot specifications and job concurrency. This topic guides you through customizing resources for Alibaba Blink real-time tasks in Dataphin.
Alibaba Blink custom resource configuration entry
On the Dataphin home page, single click the top menu bar Development to navigate to the Development page.
On the Development page, use the following figure as a guide to reach the custom resource configuration page for BLINK_SQL tasks.
Configuration description
The resource configuration page displays a topology, where each block represents an individual computing task with configurable resources. Grouping nodes on the same compute machine minimizes cross-network data transfers and enhances performance.
By default, the system suggests a resource configuration.
Configure group runtime parameters
Single click the
in the upper right corner of the desired group to open the Customize Group Runtime Parameters dialog box and set the parameters.
Parameter
Description
core
Typically set to 0.25, the core parameter indicates that one CPU can support up to four simultaneous threads, with a maximum value of 1.
heap_memory
Specifies the heap memory size for a Java application, measured in MB. The heap_memory and its components can be adjusted using JVM command line parameters. Depending on the program's scale, you may allocate heap_memory for program caches and other overheads.
Heap memory size and its constituent elements can be managed through various JVM command-line parameters. Typically, Blink programs necessitate additional heap memory overhead, for instance, allocating heap memory for program caching. Consequently, you should adjust its size based on the program's scale.
parallel
Determines the number of concurrent threads. Select a suitable number for task execution, keeping in mind that higher numbers demand more resources and may not always yield better performance. Typically, a compute node can handle two to four thousand data pieces per second.
ImportantIf the source is tt, the queue size of tt caps the concurrency. Exceeding this limit results in an error.
direct_memory
The direct memory size, in MB, is used outside the JVM's heap. It's essential for tasks using igraph or swift, and an appropriate value ranges from 16 to 32 MB. Direct memory facilitates efficient data read and write operations by avoiding Java heap and native heap data copying.
Java NIO (New I/O) features channels and buffers for data read and write operations. By setting the direct_memory parameter, you can allocate non-heap memory through native functions. Utilizing a direct ByteBuffer object within the Java heap allows for direct reads, eliminating the need to transfer data between the Java and native heaps, thereby enhancing read and write performance.
native_memory
Native memory size, in MB, varies based on several factors, including operating system processes, bytecode length, thread count, garbage collection data, and third-party packages. For instance, a 32-bit OS may support up to 3-4 GB of native memory.
Native memory stores:
Java object status for garbage collection and heap management.
JNI call information.
JIT compilation data, including Java bytecode and machine code.
Direct buffer details.
Once parameters are set, single click OK.
Configure operator runtime parameters
Access the Customize Operator Runtime Parameters dialog box and configure the parameters.
For details on the core, heap_memory, parallel, direct_memory, and native_memory parameters within the Customize Operator Runtime Parameters dialog box, refer to Configure group runtime parameters and . The table below provides descriptions for only the state_size and chain_strategy parameters.
Parameter
Description
state_size
The default state data size used during job execution, typically set to 0.
chain_strategy
The node connection policy, with options:
Always: The default setting, indicating all nodes are on the same machine unless specified otherwise.
Never: Nodes are independently deployed, not sharing a machine with others.
Head: Nodes can share a machine but only as the head node in a group.
ImportantAs Head and Never are less common, Always is typically the default choice.
After setting the parameters, single click OK.