The Flink Python Streaming node allows you to run Flink real-time tasks by submitting Python files. In DataWorks, you can select an uploaded Flink Python resource or a Flink File as the Python file path. After you configure the entry module and runtime parameters, you can develop and deploy Python-based real-time data processing tasks. This topic describes how to develop and configure a Flink Python Streaming node in DataWorks.
Prerequisites
You have associated a Realtime Compute for Apache Flink compute resource in Administration. For more information, see Bind a fully managed Flink compute resource.
You have uploaded a Flink Python resource. For more information, see Flink resources and functions.
You have created a Flink Python Streaming node. For more information, see Create a node for a scheduling workflow.
The following OpenAPI permissions have been granted to the RAM user or RAM role that is used by DataWorks to call the OpenAPI of Realtime Compute for Apache Flink. This authorization is used to submit and deploy node tasks to a Flink cluster.
{ "Version": "1", "Statement": [ { "Effect": "Allow", "Action": ["stream:CreateDeployment", "stream:UpdateDeployment", "stream:GetDeployment", "stream:DeleteDeployment"], "Resource": ["*"] } ] }
Limitations
This node cannot be used in a workflow. You can develop and run it only as an independent node.
Only serverless resource groups are supported. Legacy exclusive resource groups for scheduling are not supported.
Step 1: Configure Flink Python Streaming node
On the Flink Python Streaming node editor page, configure the following parameters.
Configure main parameters
In the left pane of the node editor page, configure the following parameters.
Parameter | Description |
Python file address | Required. From the drop-down list, select an uploaded Flink Python resource or Flink File. Flink Python resources only support |
Entry Module | The entry module of the program, such as |
Entry Point Main Arguments | The job parameters. |
Python Libraries | From the drop-down list, select an uploaded Flink File as a third-party Python package. Third-party Python packages are added to the PYTHONPATH of the Python worker process, so they can be directly accessed in Python user-defined functions. |
Python Archives | From the drop-down list, select an uploaded Flink File as an archive file. Currently, ZIP-format files are supported, such as |
Additional dependency files | From the drop-down list, select an uploaded Flink File as an additional dependency file. |
Configure Flink resources
The Flink resource configuration (including Flink cluster, engine version, resource group, resource mode, and Job Manager/Task Manager parameters) is the same as that for the Flink JAR Streaming node. For more information, see the configuration instructions in that topic.
(Optional) Configure script parameters
In the right-side navigation pane, under the Real-Time configuration section, click Add parameters in the Script Parameters area, and edit the parameter names and values.
(Optional) Configure Flink runtime parameters
The Flink runtime parameter configuration (including system checkpoint interval, minimum pause between checkpoints, state data expiration time, and other configurations) is the same as that for the Flink JAR Streaming node. For more information, see the configuration instructions in that topic.
After you complete the task configuration, click Save to save the node task.
Step 2: Start Flink Python Streaming node
Deploy the Flink Python Streaming node.
Tasks must be deployed to Operation Center before they can be run. Follow the on-screen instructions to deploy the Flink Python Streaming node. For more information, see Deploy a node.
Start the Flink Python Streaming node.
After the task is deployed, you can click Go to operation and maintenance below Deploy to production environment. In Operation Center, navigate to , find the task that you want to start, and click Start in the Operation column to start the real-time task and view its running status.