You can configure runtime resources for real-time integration nodes. These resources will be utilized for node execution during real-time integration. This topic explains how to configure resources for real-time integration.
Procedure
On the Dataphin home page, in the top menu bar, select Development > Data Integration.
In the top menu bar, select Project (in the Dev-Prod pattern, also select Environment).
In the navigation pane on the left, select Integration > Stream Pipeline.
Click the target real-time integration node name. In the top menu bar of the current real-time integration node tab, click Resource Configuration.
In the Resource Configuration area, set the resources for the real-time integration node.
Parameter
Description
Incremental Synchronization
Engine Version
The real-time computing engine and its version that the node uses.
Alibaba Cloud Realtime Compute for Flink (VVP): VVP vvr-6.0.4-flink-1.15
Flink on YARN: Open Flink 1.15.3
Flink on K8s: OPEN_FLINK_K8S 1.15.3
Job Manager CPU,Task Manager CPU
The default is 1. When the real-time computing source uses Ververica Flink or Flink (K8s deployment mode), two decimal places are supported. When the real-time computing source uses Flink (Yarn deployment mode), only positive integers are supported.
Job Manager Memory,Task Manager Memory
The default is 2 Gi. You can enter a number (in bytes) or a number with a memory unit of Gi/Mi. For example, 1024000, 1024 Mi, 1.5 Gi.
Data Refresh Interval/Batch Write Interval
Only when the real-time integration destination database is Hive, you need to configure the Data Refresh Interval.
Data lake table format is not selected: Default is 15 minutes. The minimum submission interval is 1 minute, and the maximum is 60 minutes. The shorter the interval for writing data to the Hive object file, the more Hive object files there are, affecting Hadoop cluster performance.
Data lake table format is Hudi: The refresh interval can be selected as Minutes or Seconds, with a minimum of 5 seconds and a maximum of 60 minutes.
Only when the real-time integration destination database is MaxCompute, you need to configure the Batch Write Interval, which is the refresh interval for writing data to the MaxCompute data table. Default is 30 s. The minimum is 5 seconds, and the maximum is 60 minutes.
NoteWhen the real-time integration destination database is neither Hive nor MaxCompute, this parameter is not supported.
Full Synchronization
Development Node Schedule Resource Group, Production Node Schedule Resource Group
When the project is in Dev-Prod pattern, you can configure the Development Node Schedule Resource Group and Production Node Schedule Resource Group. When the project is in Basic pattern, you can only configure the Resource Group. The default selection is Project Default Resource Group (Tenant Default Resource Group). You can click View Resource Group Details to go to Management Center > System Settings > Resource Settings > Resource Group Settings to view resource group details.
Development Node Schedule Resource Group: The resources consumed when executing development nodes. Resources are isolated and do not affect each other between different resource groups. After the node is submitted, you can modify it in the properties of the development environment node.
Production Node Schedule Resource Group, Schedule Resource Group: The resources consumed by the scheduling of instances generated by the node. Resources are isolated and do not affect each other between different resource groups. After the node is submitted, it can only be modified in the Operation Center of the production environment.
NoteOnly when the synchronization solution of the real-time integration node is selected as Real-time Incremental + Full, this item can be configured.
Click OK to finalize the resource configuration for the real-time integration node.