The Flink Python Batch node runs Flink batch processing jobs from a submitted Python file. In DataWorks, you can specify the Python file path by selecting an uploaded Flink Python resource or a Flink File. After you configure the entry module and scheduling parameters, you can develop and deploy large-scale batch processing jobs based on Python. This topic describes how to develop and configure a Flink Python Batch node in DataWorks.
Prerequisites
You have associated Realtime Compute for Apache Flink compute resources in Administration. For more information, see Bind Fully-managed Flink Compute Resources.
You have uploaded a Flink Python resource. For more information, see Flink Resources and Functions.
You have created a Flink Python Batch node. For more information, see Create a node for a workflow.
You have granted the following OpenAPI permissions to the RAM user or RAM role that DataWorks uses to call Realtime Compute for Apache Flink APIs. For more information, see Add permissions. This authorization allows DataWorks to submit the node's job and deploy it to a Flink cluster.
{ "Version": "1", "Statement": [ { "Effect": "Allow", "Action": ["stream:CreateDeployment", "stream:UpdateDeployment", "stream:GetDeployment", "stream:DeleteDeployment"], "Resource": ["*"] } ] }
Limitation
Only serverless resource groups are supported. Legacy exclusive resource groups for scheduling are not supported.
Step 1: Configure the Flink Python Batch node
On the Flink Python Batch node edit page, configure the following parameters.
Main tab parameters
In the left pane of the node editing page, configure the following parameters.
Parameter | Description |
Python file path | Required. From the drop-down list, select a Flink Python resource or a Flink File that you have uploaded to Resource Management. Flink Python resources support only |
Entry Module | The entry module of the program, such as |
Entry Point Main Arguments | The job parameters. |
Python Libraries | From the drop-down list, select an uploaded Flink File as a third-party Python package. Third-party Python packages are added to the PYTHONPATH of the Python worker process so that they can be directly accessed in Python user-defined functions. |
Python Archives | From the drop-down list, select an uploaded Flink File as an archive file. Archive files in ZIP format are supported, such as |
Additional Dependencies | From the drop-down list, select an uploaded Flink File as an additional dependency file. |
Configure scheduling
For schedule settings, including Flink resource information, scheduling parameters, Flink runtime parameters, scheduling policies, scheduling time, and scheduling dependencies, see the corresponding configuration instructions in Configure schedule settings.
After you complete the task configuration, click Save to save the node.
Step 2: Run the Flink Python Batch node
The task must be deployed to Operation Center before it can run. Follow the on-screen instructions to deploy the Flink Python Batch node. For more information, see Deploy a node. After deployment, you can view the running status of scheduled instances in Operation Center.