The Flink JAR Batch node lets you run Flink batch jobs by submitting JAR packages. In DataWorks, you select an uploaded Flink JAR resource as the job's entry point, configure the entry point class and scheduling parameters, and then develop and deploy large-scale data batch processing jobs. This topic describes how to develop and configure a Flink JAR Batch node in DataWorks.
Prerequisites
You have bound a compute resource for Realtime Compute for Apache Flink in Administration. For more information, see Bind a fully managed Flink compute resource.
You have uploaded a Flink JAR resource. For more information, see Flink resources and functions.
You have created a Flink JAR Batch node. For more information, see Create a node for a scheduling workflow.
The RAM user or RAM role for DataWorks requires the following API permissions to call Realtime Compute for Apache Flink APIs. These permissions are required to submit and deploy the node to a Flink cluster. For more information, see Add permissions.
{ "Version": "1", "Statement": [ { "Effect": "Allow", "Action": ["stream:CreateDeployment", "stream:UpdateDeployment", "stream:GetDeployment", "stream:DeleteDeployment"], "Resource": ["*"] } ] }
Limitation
Only serverless resource groups are supported. Legacy exclusive resource groups for scheduling are not supported.
Step 1: Configure the Flink JAR Batch node
On the Flink JAR Batch node editing page, configure the following parameters.
Configure main parameters
In the left pane of the node editing page, configure the following parameters.
Parameter | Description |
JAR file | Required. From the drop-down list, select a Flink JAR resource uploaded in Resource Management. |
Entry Point Class | The program's entry point class. If the JAR package does not specify a main class, you must provide the full path of the entry point class. |
Entry Point Main Arguments | Arguments passed to the main method. You can enter multiple arguments. |
Additional dependency files | From the drop-down list, select an uploaded Flink File resource to use as an additional dependency. Note If the deployment target for the Flink compute resource is a Session cluster, additional dependency files do not take effect. |
Configure scheduling
In the Scheduling Settings pane on the right side of the editing page, configure the following parameters.
Flink resource information
Parameter | Description |
Flink cluster | The name of the fully managed Flink compute resource bound in Administration. |
Flink engine version | Select an engine version based on your requirements. |
Resource Group | Select a serverless resource group that has network connectivity with Flink. |
Job Manager CPU | Based on Flink best practices, JobManager requires at least 0.5 CPU cores and 2 GiB of memory for stable operation. We recommend 1 CPU core and 4 GiB of memory. The maximum is 16 CPU cores. |
Job Manager Memory | The memory allocated to JobManager affects its ability to handle scheduling and management tasks. The recommended range is 2 GiB to 64 GiB. |
Task Manager CPU | The CPU allocated to TaskManager affects its task processing capability. We recommend at least 0.5 CPU cores and 2 GiB of memory. The recommended configuration is 1 CPU core and 4 GiB of memory. The maximum is 16 CPU cores. |
Task Manager Memory | The memory allocated to TaskManager determines the data volume and performance for task processing. The minimum memory is 2 GiB, and the maximum is 64 GiB. |
Concurrency | The number of tasks that can be executed in parallel in a Flink job. You can select Auto Infer to let the system automatically determine the parallelism based on the job characteristics. |
Maximum Slots | The maximum number of slots available for the job. This parameter limits the upper bound of resources the job can use. |
Number of slots per TaskManager | The number of slots per TaskManager determines how many tasks it can execute in parallel. |
Scheduling parameters
You can configure scheduling parameters in the Scheduling Parameters section to enable dynamic parameter passing in scheduling scenarios. For more information, see Configure scheduling parameters.
For other scheduling-related configurations, including Flink Runtime Parameters, Scheduling Policy, Scheduling Time, and Scheduling Dependencies, see the corresponding configuration descriptions in Flink SQL Batch node.
After you complete the task configuration, click Save to save the node.
Step 2: Run the Flink JAR Batch node
The task must be deployed to Operation Center before it can run. Follow the on-screen instructions to deploy the Flink JAR Batch node. For more information, see Deploy a node. After the deployment, you can view the running status of scheduled instances in Operation Center.