Use a Flink JAR Streaming node to run Flink real-time tasks from a JAR package. In DataWorks, select an uploaded Flink JAR resource as the job entry point, configure the entry point class and runtime parameters, and then develop and deploy a real-time data processing task. This topic describes how to develop and configure a Flink JAR Streaming node in DataWorks.
Prerequisites
You have associated a fully managed Flink compute engine in Administration. For more information, see Associate a fully managed Flink computing resource.
You have uploaded a Flink JAR resource. For more information, see Flink resources and functions.
You have created a Flink JAR Streaming node. For more information, see Create a scheduled workflow node.
You have granted the following OpenAPI permissions to the RAM user or RAM role that DataWorks uses to call the OpenAPI of Realtime Compute for Apache Flink. These permissions are used to submit and deploy node tasks to a Flink cluster.
{ "Version": "1", "Statement": [ { "Effect": "Allow", "Action": ["stream:CreateDeployment", "stream:UpdateDeployment", "stream:GetDeployment", "stream:DeleteDeployment"], "Resource": ["*"] } ] }
Limitations
This node cannot be part of a workflow and must be developed and run as a standalone node.
Only serverless resource groups are supported. Legacy exclusive resource groups for scheduling are not supported.
Step 1: Configure the Flink JAR Streaming node
On the Flink JAR Streaming node edit page, configure the following parameters.
Main parameters
In the left pane of the node edit page, configure the following parameters.
Parameter | Description |
JAR file | Required. Select a Flink JAR resource from Resource Management. |
Entry point class | The entry point class for your program. If the JAR package does not specify a main class, enter the fully qualified name of the entry point class. |
Entry point main arguments | The main arguments for the job, which are passed to the main method. Multiple arguments are supported. |
Additional dependencies | Select an uploaded Flink file as an additional dependency from the drop-down list. Note If the deployment target in the Flink compute resource is set to a Session cluster, additional dependencies do not take effect. |
Configure Flink resources
In the Flink resource information section of the Real-Time configuration pane on the right side of the edit page, configure the following parameters based on the Resource Mode. For more information, see Configure job resources.
Parameter | Description |
Flink cluster | The name of the fully managed Flink compute resource associated in Administration. |
Flink engine version | Select an engine version based on your business requirements. |
Resource Group | Select a serverless resource group that has network connectivity with Flink. |
Resource Mode | The following two modes are supported. For more information, see Configure job resources.
|
Job Manager CPU | Based on Flink best practices, JobManager requires at least 0.5 CPU cores and 2 GiB of memory for stable operation. We recommend 1 CPU core and 4 GiB of memory, with a maximum of 16 CPU cores. |
Job Manager Memory | The memory configuration of JobManager affects its ability to handle scheduling and management tasks. The recommended range is 2 GiB to 64 GiB. |
Task Manager CPU | The CPU configuration of TaskManager affects its task processing capability. We recommend at least 0.5 CPU cores and 2 GiB of memory, with a recommended configuration of 1 CPU core and 4 GiB of memory, and a maximum of 16 CPU cores. |
Task Manager Memory | The memory configuration of TaskManager determines the data volume and performance of task processing. The minimum memory size is 2 GiB, and the maximum is 64 GiB. |
Concurrency | Determines the number of parallel task executions in a Flink job. A higher concurrency can improve processing speed and resource utilization. Set this parameter based on the cluster resources and job characteristics. |
Number of slots per TaskManager | The number of slots per TaskManager determines the number of tasks that can be executed in parallel. You can adjust the slot configuration to optimize resource utilization and parallel processing capability. |
(Optional) Configure script parameters
In the Script Parameters section of the Real-Time configuration pane on the right side, click Add parameters and edit the Parameter name and Parameter Value.
(Optional) Configure Flink running parameters
In the Flink running parameters section of the Real-Time configuration pane on the right side, configure the following parameters. For more information, see Configure job deployment.
Parameter | Description |
System Checkpoint Interval | This parameter specifies the interval at which Flink periodically performs system checkpoints. A shorter interval reduces failure recovery time but increases system overhead. If this parameter is not specified, system checkpoints are disabled. |
Minimum time interval between two system checkpoints | This parameter specifies the minimum pause time that Flink must wait between consecutive checkpoints to prevent overly frequent checkpoints from affecting system performance. |
State data expiration time | This parameter specifies the maximum duration that state data in a Flink job can be retained without being accessed or updated. The default value is 36 hours. Important The default value is based on cloud best practices and differs from the open source default value (0, which means state data never expires). |
Others | Other Flink running parameters are supported. For example: |
After you complete the task configuration, click Save to save the node task.
Step 2: Start the Flink JAR Streaming node
Deploy the Flink JAR Streaming node.
Tasks must be deployed to Operation Center before they can be run. Follow the on-screen instructions to deploy the Flink JAR Streaming node. For more information, see Node and workflow deployment.
Start the Flink JAR Streaming node.
After the task is deployed, click Go to operation and maintenance below Deploy to production environment. In Operation Center, navigate to , find the task that you want to start, and click Start in the Operation column to start and monitor the real-time task.