Getting started with a Flink Python deployment - Realtime Compute for Apache Flink

This topic describes how to create and start a Python streaming deployment and a Python batch deployment in the console of fully managed Flink.

Prerequisites

If you want to use a RAM user or RAM role to access the console of fully managed Flink, make sure that the RAM user or RAM role has the required permissions. For more information, see Permission management.
A workspace is created. For more information, see Activate Realtime Compute for Apache Flink.

Step 1: Download a test Python file

You can download a test Python file based on the type of your deployment and an input data file for subsequent operations.

Download a test Python file based on the type of your deployment.
- Streaming deployment: Click word_count_streaming.py.
- Batch deployment: Click word_count_batch.py.
Click Shakespeare to download the input data file Shakespeare.

Python packages cannot be developed in the console of fully managed Flink. Therefore, you need to develop Python packages in your on-premises environment. For more information about how to develop a Python draft, debug a Python draft, and use a connector, see Develop a Python API draft.

Step 2: Upload a Python file

Log on to the Realtime Compute for Apache Flink console.
On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
In the left-side navigation pane, click Artifacts.
In the upper-left corner of the Artifacts page, click Upload Artifact and select the desired Python file or the test Python file that you downloaded.

Note

You can use dependencies in Python deployments. The dependencies include custom Python virtual environments, third-party Python packages, JAR packages, and data files. For more information, see Use Python dependencies.

Step 3: Create a deployment

Streaming deployment

Log on to the Realtime Compute for Apache Flink console.
On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
In the left-side navigation pane, click Deployments. On the Deployments page, click Create Deployment.

In the Create Deployment dialog box, configure the parameters of the deployment. The following table describes the parameters.

Parameter	Description	Example
Deployment Type	Select JAR or PYTHON.	PYTHON
Deployment Mode	Select Stream Mode or Batch Mode.	Stream Mode
Deployment Name	Enter the name of the deployment that you want to create.	flink-streaming-test-python
Engine Version	The engine version of Flink that is used by the deployment. For more information about engine versions, version mappings, and important time points in the lifecycle of each version, see Engine version.	vvr-6.0.7-flink-1.15
Python Uri	Click word_count_streaming.py to download the test Python file, and click the icon on the right side of the Python Uri field to select the test Python file and upload the file.	oss://flink-test-oss/artifacts/namespaces/flink-test-default/word_count_streaming.py
Entry Module	The entry point class of the program. If the file that you upload is a .py file, you do not need to configure this parameter. If the file that you upload is a .zip file, you must configure this parameter. For example, you can set the Entry Module parameter to word_count.	Not required
Entry Point Main Arguments	The OSS directory in which you want to store the input data file. Note In this example, the input data file, output file, and test Python file are stored in a bucket named flink-test-oss in the OSS console. This example shows how to configure this parameter to write the output data to the specified OSS bucket. You need to only specify the directory and name of the output data file. You do not need to create a directory in advance.	--input oss://flink-test-oss/artifacts/namespaces/flink-test-default/Shakespeare
Python Libraries	A third-party Python package. The third-party Python package that you uploaded is added to PYTHONPATH of the Python worker process. This way, the package can be directly accessed in Python user-defined functions (UDFs). For more information about how to use third-party Python packages, see Use a third-party Python package.	Not required
Python Archives	Archive files. For more information about Python Archives, see Use a custom Python virtual environment and Use data files.	Not required
Additional Dependencies	You can enter the OSS bucket in which the required dependency file is stored or the URL of the dependency file.	Not required
Deployment Target	Select the desired queue or session cluster from the drop-down list. We recommend that you do not use session clusters in the production environment. For more information, see Manage queues and Step 1: Create a session cluster. Note Metrics of deployments that are deployed in session clusters cannot be displayed. Session clusters do not support the monitoring and alerting feature and the Autopilot feature. Session clusters are suitable for development and test environments. We recommend that you do not use session clusters in the production environment. For more information, see Debug a deployment.	default-queue
Description	Optional. You can enter the description for the deployment.	Not required
Label	After you specify labels for a deployment, you can search for the deployment by label key and label value on the Deployments page. You can specify a maximum of three labels for a deployment.	Not required
More Setting	If you turn on the switch, you must configure the following parameters: Kerberos Name: Select a Hive cluster that supports Kerberos authentication from the drop-down list. For more information about how to create a Hive cluster that supports Kerberos authentication, see Register a Hive cluster that supports Kerberos authentication. principal: a Kerberos principal, which can be a user or a service. A Kerberos principal is used to uniquely identify an identity in the Kerberos encryption system.	Not required

Click Deploy.

Batch deployment

Log on to the Realtime Compute for Apache Flink console.
On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
In the left-side navigation pane, click Deployments. On the Deployments page, click Create Deployment.

In the Create Deployment dialog box, configure the parameters of the deployment. The following table describes the parameters.

Parameter	Description	Example
Deployment Type	Select JAR or PYTHON.	PYTHON
Deployment Mode	Select Stream Mode or Batch Mode.	Batch Mode
Deployment Name	Enter the name of the deployment that you want to create.	flink-batch-test-python
Engine Version	The engine version of Flink that is used by the deployment. For more information about engine versions, version mappings, and important time points in the lifecycle of each version, see Engine version.	vvr-6.0.7-flink-1.15
Python Uri	Click word_count_batch.py to download the test Python file, and click the icon on the right side of the Python Uri field to select the test Python file and upload the file.	oss://flink-test-oss/artifacts/namespaces/flink-test-default/word_count_batch.py
Entry Module	The entry point class of the program. If the file that you upload is a .py file, you do not need to configure this parameter. If the file that you upload is a .zip file, you must configure this parameter. For example, you can set the Entry Module parameter to word_count.	Not required
Entry Point Main Arguments	The OSS directory in which the input data file and output data file are stored. Note In this example, the input data file, output file, and test Python file are stored in a bucket named flink-test-oss in the OSS console. This example shows how to configure this parameter to write the output data to the specified OSS bucket. You need to only specify the directory and name of the output data file. You do not need to create a directory in advance.	--input oss://flink-test-oss/artifacts/namespaces/flink-test-default/Shakespeare --output oss://flink-test-oss/artifacts/namespaces/flink-test-default/python-batch-quickstart-test-output.txt
Python Libraries	A third-party Python package. The third-party Python package that you uploaded is added to PYTHONPATH of the Python worker process. This way, the package can be directly accessed in Python UDFs. For more information about how to use third-party Python packages, see Use a third-party Python package.	Not required
Python Archives	Archive files. For more information about Python Archives, see Use a custom Python virtual environment and Use data files.	Not required
Additional Dependencies	You can enter the OSS bucket in which the required dependency file is stored or the URL of the dependency file.	Not required
Deployment Target	Select the desired queue or session cluster from the drop-down list. We recommend that you do not use session clusters in the production environment. For more information, see Manage queues and Step 1: Create a session cluster. Note Metrics of deployments that are deployed in session clusters cannot be displayed. Session clusters do not support the monitoring and alerting feature and the Autopilot feature. Session clusters are suitable for development and test environments. We recommend that you do not use session clusters in the production environment. For more information, see Debug a deployment.	default-queue
Description	Optional. You can enter the description for the deployment.	Not required
Label	After you specify labels for a deployment, you can search for the deployment by label key and label value on the Deployments page. You can specify a maximum of three labels for a deployment.	Not required
More Setting	If you turn on the switch, you must configure the following parameters: Kerberos Name: Select a Hive cluster that supports Kerberos authentication from the drop-down list. For more information about how to create a Hive cluster that supports Kerberos authentication, see Register a Hive cluster that supports Kerberos authentication. principal: a Kerberos principal, which can be a user or a service. A Kerberos principal is used to uniquely identify an identity in the Kerberos encryption system.	Not required

Click Deploy.

Step 4: Start the Python deployment

On the Deployments page in the console of fully managed Flink, find the desired deployment and click Start in the Actions column.
In the Start Job dialog box, configure the parameters. For more information about how to configure the parameters, see Start a deployment.
Click Start.
After you click Start, the deployment status changes to RUNNING or FINISHED. This indicates that the deployment runs as expected.
Important
- If you upload the test Python file before you create the deployment, the deployment is in the FINISHED state.
- If you want to start a batch deployment, you must change the deployment type from STREAM to BATCH from the drop-down list on the right side of Create Draft on the Deployments page. By default, the system displays streaming deployments.

Step 5: View the computing result

Computing result of a streaming deployment
On the Deployments page, click the name of the desired deployment. On the page that appears, click Exploration. On the Running Task Managers tab, click the value in the Path, ID column. On the page that appears, click the Log List tab. Find the log file whose name ends with .out in the Log Name column and click the name of the log file. Then, search for the shakespeare keyword in the log file to view the computing result.
Important
If you upload the test Python file before you create the deployment, the computing result of the streaming deployment is deleted when the streaming deployment enters the FINISHED state. You can view the computing result of the streaming deployment only when the streaming deployment is in the RUNNING state.
Computing result of a batch deployment
Log on to the OSS console and view the computing result of a batch deployment in the directory in which the output data file is stored.
In this example, the output data file is stored in the oss://flink-test-oss/artifacts/namespaces/flink-test-default/batch-quickstart-test-output.txt/ directory. The names of the folders in this directory are the start date and the start time of the deployment. You can click the folder to view the output data file. After you click the name of the desired output data file, you can click Download in the panel that appears. The following figure shows how to download the output data file.
The computing result of the batch deployment is an .ext file. After you download the output data file, you can use Notepad or Microsoft Office Word to open the file. The following figure shows the computing result.

(Optional) Step 6: Cancel the deployment

If you modify the deployment and the deployment is in the RUNNING or FINISHED state, you can cancel and then restart the deployment to make the modification take effect. If the deployment fails and cannot reuse the state data to recover, you must cancel and then restart the deployment. For more information about how to cancel a deployment, see Cancel a deployment.

References

After you create a Python deployment, you can configure automatic tuning for the deployment to improve resource utilization. For more information about automatic tuning, see Configure automatic tuning.
For more information about how to create an SQL deployment, see Getting started with a Flink SQL deployment.
For more information about how to create a JAR deployment, see Getting started with a Flink JAR deployment.
For more information about how to ingest data into data warehouses in real time, see Ingest data into data warehouses in real time.
For more information about how to build a real-time data warehouse by using Realtime Compute for Apache Flink and Hologres, see Build a real-time data warehouse by using Realtime Compute for Apache Flink and Hologres.
For more information about how to build a streaming data lakehouse by using Realtime Compute for Apache Flink and Apache Paimon, see Build a streaming data lakehouse by using Realtime Compute for Apache Flink and Apache Paimon.