Flink Python Streaming node usage - DataWorks - Alibaba Cloud Documentation Center

The Flink Python Streaming node allows you to run Flink real-time tasks by submitting Python files. In DataWorks, you can select an uploaded Flink Python resource or a Flink File as the Python file path. After you configure the entry module and runtime parameters, you can develop and deploy Python-based real-time data processing tasks. This topic describes how to develop and configure a Flink Python Streaming node in DataWorks.

Prerequisites

You have associated a Realtime Compute for Apache Flink compute resource in Administration. For more information, see Bind a fully managed Flink compute resource.
You have uploaded a Flink Python resource. For more information, see Flink resources and functions.
You have created a Flink Python Streaming node. For more information, see Create a node for a scheduling workflow.
The following OpenAPI permissions have been granted to the RAM user or RAM role that is used by DataWorks to call the OpenAPI of Realtime Compute for Apache Flink. This authorization is used to submit and deploy node tasks to a Flink cluster.
```
{
  "Version": "1",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["stream:CreateDeployment", "stream:UpdateDeployment", "stream:GetDeployment", "stream:DeleteDeployment"],
      "Resource": ["*"]
    }
  ]
}
```

Limitations

This node cannot be used in a workflow. You can develop and run it only as an independent node.
Only serverless resource groups are supported. Legacy exclusive resource groups for scheduling are not supported.

Step 1: Configure Flink Python Streaming node

On the Flink Python Streaming node editor page, configure the following parameters.

Configure main parameters

In the left pane of the node editor page, configure the following parameters.

Parameter	Description
Python file address	Required. From the drop-down list, select an uploaded Flink Python resource or Flink File. Flink Python resources only support `.py` files. If you want to use a `.zip` file as the main job entry, upload it as a Flink File. This parameter must end with `.py` if the Entry Module parameter is empty.
Entry Module	The entry module of the program, such as `example.word_count`. This parameter is required when the Python file address points to a `.zip` file.
Entry Point Main Arguments	The job parameters.
Python Libraries	From the drop-down list, select an uploaded Flink File as a third-party Python package. Third-party Python packages are added to the PYTHONPATH of the Python worker process, so they can be directly accessed in Python user-defined functions.
Python Archives	From the drop-down list, select an uploaded Flink File as an archive file. Currently, ZIP-format files are supported, such as `.zip`, `.jar`, `.whl`, and `.egg`.
Additional dependency files	From the drop-down list, select an uploaded Flink File as an additional dependency file.

Configure Flink resources

The Flink resource configuration (including Flink cluster, engine version, resource group, resource mode, and Job Manager/Task Manager parameters) is the same as that for the Flink JAR Streaming node. For more information, see the configuration instructions in that topic.

(Optional) Configure script parameters

In the right-side navigation pane, under the Real-Time configuration section, click Add parameters in the Script Parameters area, and edit the parameter names and values.

(Optional) Configure Flink runtime parameters

The Flink runtime parameter configuration (including system checkpoint interval, minimum pause between checkpoints, state data expiration time, and other configurations) is the same as that for the Flink JAR Streaming node. For more information, see the configuration instructions in that topic.

After you complete the task configuration, click Save to save the node task.

Step 2: Start Flink Python Streaming node

Deploy the Flink Python Streaming node.
Tasks must be deployed to Operation Center before they can be run. Follow the on-screen instructions to deploy the Flink Python Streaming node. For more information, see Deploy a node.
Start the Flink Python Streaming node.
After the task is deployed, you can click Go to operation and maintenance below Deploy to production environment. In Operation Center, navigate to Node O&M > Real-time Task O&M > Real-time computing tasks, find the task that you want to start, and click Start in the Operation column to start the real-time task and view its running status.