Flink Python Batch node usage - DataWorks - Alibaba Cloud Documentation Center

The Flink Python Batch node runs Flink batch processing jobs from a submitted Python file. In DataWorks, you can specify the Python file path by selecting an uploaded Flink Python resource or a Flink File. After you configure the entry module and scheduling parameters, you can develop and deploy large-scale batch processing jobs based on Python. This topic describes how to develop and configure a Flink Python Batch node in DataWorks.

Prerequisites

You have associated Realtime Compute for Apache Flink compute resources in Administration. For more information, see Bind Fully-managed Flink Compute Resources.
You have uploaded a Flink Python resource. For more information, see Flink Resources and Functions.
You have created a Flink Python Batch node. For more information, see Create a node for a workflow.
You have granted the following OpenAPI permissions to the RAM user or RAM role that DataWorks uses to call Realtime Compute for Apache Flink APIs. For more information, see Add permissions. This authorization allows DataWorks to submit the node's job and deploy it to a Flink cluster.
```
{
  "Version": "1",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["stream:CreateDeployment", "stream:UpdateDeployment", "stream:GetDeployment", "stream:DeleteDeployment"],
      "Resource": ["*"]
    }
  ]
}
```

Limitation

Only serverless resource groups are supported. Legacy exclusive resource groups for scheduling are not supported.

Step 1: Configure the Flink Python Batch node

On the Flink Python Batch node edit page, configure the following parameters.

Main tab parameters

In the left pane of the node editing page, configure the following parameters.

Parameter	Description
Python file path	Required. From the drop-down list, select a Flink Python resource or a Flink File that you have uploaded to Resource Management. Flink Python resources support only `.py` files. To use a `.zip` file as the main job entry, upload it as a Flink File. If the Entry Module parameter is left empty, this parameter must end with `.py`.
Entry Module	The entry module of the program, such as `example.word_count`. This parameter is required when the Python file path ends with `.zip`.
Entry Point Main Arguments	The job parameters.
Python Libraries	From the drop-down list, select an uploaded Flink File as a third-party Python package. Third-party Python packages are added to the PYTHONPATH of the Python worker process so that they can be directly accessed in Python user-defined functions.
Python Archives	From the drop-down list, select an uploaded Flink File as an archive file. Archive files in ZIP format are supported, such as `.zip`, `.jar`, `.whl`, and `.egg`.
Additional Dependencies	From the drop-down list, select an uploaded Flink File as an additional dependency file.

Configure scheduling

For schedule settings, including Flink resource information, scheduling parameters, Flink runtime parameters, scheduling policies, scheduling time, and scheduling dependencies, see the corresponding configuration instructions in Configure schedule settings.

After you complete the task configuration, click Save to save the node.

Step 2: Run the Flink Python Batch node

The task must be deployed to Operation Center before it can run. Follow the on-screen instructions to deploy the Flink Python Batch node. For more information, see Deploy a node. After deployment, you can view the running status of scheduled instances in Operation Center.