Flink JAR Streaming node usage - DataWorks - Alibaba Cloud Documentation Center

Use a Flink JAR Streaming node to run Flink real-time tasks from a JAR package. In DataWorks, select an uploaded Flink JAR resource as the job entry point, configure the entry point class and runtime parameters, and then develop and deploy a real-time data processing task. This topic describes how to develop and configure a Flink JAR Streaming node in DataWorks.

Prerequisites

You have associated a fully managed Flink compute engine in Administration. For more information, see Associate a fully managed Flink computing resource.
You have uploaded a Flink JAR resource. For more information, see Flink resources and functions.
You have created a Flink JAR Streaming node. For more information, see Create a scheduled workflow node.
You have granted the following OpenAPI permissions to the RAM user or RAM role that DataWorks uses to call the OpenAPI of Realtime Compute for Apache Flink. These permissions are used to submit and deploy node tasks to a Flink cluster.
```
{
  "Version": "1",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["stream:CreateDeployment", "stream:UpdateDeployment", "stream:GetDeployment", "stream:DeleteDeployment"],
      "Resource": ["*"]
    }
  ]
}
```

Limitations

This node cannot be part of a workflow and must be developed and run as a standalone node.
Only serverless resource groups are supported. Legacy exclusive resource groups for scheduling are not supported.

Step 1: Configure the Flink JAR Streaming node

On the Flink JAR Streaming node edit page, configure the following parameters.

Main parameters

In the left pane of the node edit page, configure the following parameters.

Parameter	Description
JAR file	Required. Select a Flink JAR resource from Resource Management.
Entry point class	The entry point class for your program. If the JAR package does not specify a main class, enter the fully qualified name of the entry point class.
Entry point main arguments	The main arguments for the job, which are passed to the main method. Multiple arguments are supported.
Additional dependencies	Select an uploaded Flink file as an additional dependency from the drop-down list. Note If the deployment target in the Flink compute resource is set to a Session cluster, additional dependencies do not take effect.

Configure Flink resources

In the Flink resource information section of the Real-Time configuration pane on the right side of the edit page, configure the following parameters based on the Resource Mode. For more information, see Configure job resources.

Parameter	Description
Flink cluster	The name of the fully managed Flink compute resource associated in Administration.
Flink engine version	Select an engine version based on your business requirements.
Resource Group	Select a serverless resource group that has network connectivity with Flink.
Resource Mode	The following two modes are supported. For more information, see Configure job resources. Basic mode (default): Suitable for beginners and simple scenarios. Uses default configurations and simplified settings to quickly start and run Flink jobs. Expert mode: Provides advanced configuration options for experienced users, allowing fine-grained tuning of performance and resources to meet complex or high-performance requirements.
Job Manager CPU	Based on Flink best practices, JobManager requires at least 0.5 CPU cores and 2 GiB of memory for stable operation. We recommend 1 CPU core and 4 GiB of memory, with a maximum of 16 CPU cores.
Job Manager Memory	The memory configuration of JobManager affects its ability to handle scheduling and management tasks. The recommended range is 2 GiB to 64 GiB.
Task Manager CPU	The CPU configuration of TaskManager affects its task processing capability. We recommend at least 0.5 CPU cores and 2 GiB of memory, with a recommended configuration of 1 CPU core and 4 GiB of memory, and a maximum of 16 CPU cores.
Task Manager Memory	The memory configuration of TaskManager determines the data volume and performance of task processing. The minimum memory size is 2 GiB, and the maximum is 64 GiB.
Concurrency	Determines the number of parallel task executions in a Flink job. A higher concurrency can improve processing speed and resource utilization. Set this parameter based on the cluster resources and job characteristics.
Number of slots per TaskManager	The number of slots per TaskManager determines the number of tasks that can be executed in parallel. You can adjust the slot configuration to optimize resource utilization and parallel processing capability.

(Optional) Configure script parameters

In the Script Parameters section of the Real-Time configuration pane on the right side, click Add parameters and edit the Parameter name and Parameter Value.

(Optional) Configure Flink running parameters

In the Flink running parameters section of the Real-Time configuration pane on the right side, configure the following parameters. For more information, see Configure job deployment.

Parameter	Description
System Checkpoint Interval	This parameter specifies the interval at which Flink periodically performs system checkpoints. A shorter interval reduces failure recovery time but increases system overhead. If this parameter is not specified, system checkpoints are disabled.
Minimum time interval between two system checkpoints	This parameter specifies the minimum pause time that Flink must wait between consecutive checkpoints to prevent overly frequent checkpoints from affecting system performance.
State data expiration time	This parameter specifies the maximum duration that state data in a Flink job can be retained without being accessed or updated. The default value is 36 hours. Important The default value is based on cloud best practices and differs from the open source default value (0, which means state data never expires).
Others	Other Flink running parameters are supported. For example: `taskmanager.network.memory.max:4g`.

After you complete the task configuration, click Save to save the node task.

Step 2: Start the Flink JAR Streaming node

Deploy the Flink JAR Streaming node.
Tasks must be deployed to Operation Center before they can be run. Follow the on-screen instructions to deploy the Flink JAR Streaming node. For more information, see Node and workflow deployment.
Start the Flink JAR Streaming node.
After the task is deployed, click Go to operation and maintenance below Deploy to production environment. In Operation Center, navigate to Node O&M > Real-time Task O&M > Real-time computing tasks, find the task that you want to start, and click Start in the Operation column to start and monitor the real-time task.