All Products
Search
Document Center

Realtime Compute for Apache Flink:Getting started with a Flink Python deployment

Last Updated:Mar 27, 2024

This topic describes how to create and start a Python streaming deployment and a Python batch deployment in the console of fully managed Flink.

Prerequisites

Step 1: Download a test Python file

You can download a test Python file based on the type of your deployment and an input data file for subsequent operations.

Python packages cannot be developed in the console of fully managed Flink. Therefore, you need to develop Python packages in your on-premises environment. For more information about how to develop a Python draft, debug a Python draft, and use a connector, see Develop a Python API draft.

Step 2: Upload a Python file

  1. Log on to the Realtime Compute for Apache Flink console.

  2. On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.

  3. In the left-side navigation pane, click Artifacts.

  4. In the upper-left corner of the Artifacts page, click Upload Artifact and select the desired Python file or the test Python file that you downloaded.

Note

You can use dependencies in Python deployments. The dependencies include custom Python virtual environments, third-party Python packages, JAR packages, and data files. For more information, see Use Python dependencies.

Step 3: Create a deployment

Streaming deployment

  1. Log on to the Realtime Compute for Apache Flink console.

  2. On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.

  3. In the left-side navigation pane, click Deployments. On the Deployments page, click Create Deployment.

  4. In the Create Deployment dialog box, configure the parameters of the deployment. The following table describes the parameters.

    Parameter

    Description

    Example

    Deployment Type

    Select JAR or PYTHON.

    PYTHON

    Deployment Mode

    Select Stream Mode or Batch Mode.

    Stream Mode

    Deployment Name

    Enter the name of the deployment that you want to create.

    flink-streaming-test-python

    Engine Version

    The engine version of Flink that is used by the deployment. For more information about engine versions, version mappings, and important time points in the lifecycle of each version, see Engine version.

    vvr-6.0.7-flink-1.15

    Python Uri

    Click word_count_streaming.py to download the test Python file, and click the 上传 icon on the right side of the Python Uri field to select the test Python file and upload the file.

    oss://flink-test-oss/artifacts/namespaces/flink-test-default/word_count_streaming.py

    Entry Module

    The entry point class of the program.

    • If the file that you upload is a .py file, you do not need to configure this parameter.

    • If the file that you upload is a .zip file, you must configure this parameter. For example, you can set the Entry Module parameter to word_count.

    Not required

    Entry Point Main Arguments

    The OSS directory in which you want to store the input data file.

    Note
    • In this example, the input data file, output file, and test Python file are stored in a bucket named flink-test-oss in the OSS console.

    • This example shows how to configure this parameter to write the output data to the specified OSS bucket. You need to only specify the directory and name of the output data file. You do not need to create a directory in advance.

    --input oss://flink-test-oss/artifacts/namespaces/flink-test-default/Shakespeare

    Python Libraries

    A third-party Python package. The third-party Python package that you uploaded is added to PYTHONPATH of the Python worker process. This way, the package can be directly accessed in Python user-defined functions (UDFs). For more information about how to use third-party Python packages, see Use a third-party Python package.

    Not required

    Python Archives

    Archive files. For more information about Python Archives, see Use a custom Python virtual environment and Use data files.

    Not required

    Additional Dependencies

    You can enter the OSS bucket in which the required dependency file is stored or the URL of the dependency file.

    Not required

    Deployment Target

    Select the desired queue or session cluster from the drop-down list. We recommend that you do not use session clusters in the production environment. For more information, see Manage queues and Step 1: Create a session cluster.

    Note

    Metrics of deployments that are deployed in session clusters cannot be displayed. Session clusters do not support the monitoring and alerting feature and the Autopilot feature. Session clusters are suitable for development and test environments. We recommend that you do not use session clusters in the production environment. For more information, see Debug a deployment.

    default-queue

    Description

    Optional. You can enter the description for the deployment.

    Not required

    Label

    After you specify labels for a deployment, you can search for the deployment by label key and label value on the Deployments page. You can specify a maximum of three labels for a deployment.

    Not required

    More Setting

    If you turn on the switch, you must configure the following parameters:

    • Kerberos Name: Select a Hive cluster that supports Kerberos authentication from the drop-down list. For more information about how to create a Hive cluster that supports Kerberos authentication, see Register a Hive cluster that supports Kerberos authentication.

    • principal: a Kerberos principal, which can be a user or a service. A Kerberos principal is used to uniquely identify an identity in the Kerberos encryption system.

    Not required

  5. Click Deploy.

Batch deployment

  1. Log on to the Realtime Compute for Apache Flink console.

  2. On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.

  3. In the left-side navigation pane, click Deployments. On the Deployments page, click Create Deployment.

  4. In the Create Deployment dialog box, configure the parameters of the deployment. The following table describes the parameters.

    Parameter

    Description

    Example

    Deployment Type

    Select JAR or PYTHON.

    PYTHON

    Deployment Mode

    Select Stream Mode or Batch Mode.

    Batch Mode

    Deployment Name

    Enter the name of the deployment that you want to create.

    flink-batch-test-python

    Engine Version

    The engine version of Flink that is used by the deployment. For more information about engine versions, version mappings, and important time points in the lifecycle of each version, see Engine version.

    vvr-6.0.7-flink-1.15

    Python Uri

    Click word_count_batch.py to download the test Python file, and click the 上传 icon on the right side of the Python Uri field to select the test Python file and upload the file.

    oss://flink-test-oss/artifacts/namespaces/flink-test-default/word_count_batch.py

    Entry Module

    The entry point class of the program.

    • If the file that you upload is a .py file, you do not need to configure this parameter.

    • If the file that you upload is a .zip file, you must configure this parameter. For example, you can set the Entry Module parameter to word_count.

    Not required

    Entry Point Main Arguments

    The OSS directory in which the input data file and output data file are stored.

    Note
    • In this example, the input data file, output file, and test Python file are stored in a bucket named flink-test-oss in the OSS console.

    • This example shows how to configure this parameter to write the output data to the specified OSS bucket. You need to only specify the directory and name of the output data file. You do not need to create a directory in advance.

    --input oss://flink-test-oss/artifacts/namespaces/flink-test-default/Shakespeare --output oss://flink-test-oss/artifacts/namespaces/flink-test-default/python-batch-quickstart-test-output.txt

    Python Libraries

    A third-party Python package. The third-party Python package that you uploaded is added to PYTHONPATH of the Python worker process. This way, the package can be directly accessed in Python UDFs. For more information about how to use third-party Python packages, see Use a third-party Python package.

    Not required

    Python Archives

    Archive files. For more information about Python Archives, see Use a custom Python virtual environment and Use data files.

    Not required

    Additional Dependencies

    You can enter the OSS bucket in which the required dependency file is stored or the URL of the dependency file.

    Not required

    Deployment Target

    Select the desired queue or session cluster from the drop-down list. We recommend that you do not use session clusters in the production environment. For more information, see Manage queues and Step 1: Create a session cluster.

    Note

    Metrics of deployments that are deployed in session clusters cannot be displayed. Session clusters do not support the monitoring and alerting feature and the Autopilot feature. Session clusters are suitable for development and test environments. We recommend that you do not use session clusters in the production environment. For more information, see Debug a deployment.

    default-queue

    Description

    Optional. You can enter the description for the deployment.

    Not required

    Label

    After you specify labels for a deployment, you can search for the deployment by label key and label value on the Deployments page. You can specify a maximum of three labels for a deployment.

    Not required

    More Setting

    If you turn on the switch, you must configure the following parameters:

    • Kerberos Name: Select a Hive cluster that supports Kerberos authentication from the drop-down list. For more information about how to create a Hive cluster that supports Kerberos authentication, see Register a Hive cluster that supports Kerberos authentication.

    • principal: a Kerberos principal, which can be a user or a service. A Kerberos principal is used to uniquely identify an identity in the Kerberos encryption system.

    Not required

  5. Click Deploy.

Step 4: Start the Python deployment

  1. On the Deployments page in the console of fully managed Flink, find the desired deployment and click Start in the Actions column.

  2. In the Start Job dialog box, configure the parameters. For more information about how to configure the parameters, see Start a deployment.

  3. Click Start.

    After you click Start, the deployment status changes to RUNNING or FINISHED. This indicates that the deployment runs as expected.

    Important
    • If you upload the test Python file before you create the deployment, the deployment is in the FINISHED state.

    • If you want to start a batch deployment, you must change the deployment type from STREAM to BATCH from the drop-down list on the right side of Create Draft on the Deployments page. By default, the system displays streaming deployments.

Step 5: View the computing result

  • Computing result of a streaming deployment

    On the Deployments page, click the name of the desired deployment. On the page that appears, click Exploration. On the Running Task Managers tab, click the value in the Path, ID column. On the page that appears, click the Log List tab. Find the log file whose name ends with .out in the Log Name column and click the name of the log file. Then, search for the shakespeare keyword in the log file to view the computing result.

    image.png

    Important

    If you upload the test Python file before you create the deployment, the computing result of the streaming deployment is deleted when the streaming deployment enters the FINISHED state. You can view the computing result of the streaming deployment only when the streaming deployment is in the RUNNING state.

  • Computing result of a batch deployment

    Log on to the OSS console and view the computing result of a batch deployment in the directory in which the output data file is stored.

    In this example, the output data file is stored in the oss://flink-test-oss/artifacts/namespaces/flink-test-default/batch-quickstart-test-output.txt/ directory. The names of the folders in this directory are the start date and the start time of the deployment. You can click the folder to view the output data file. After you click the name of the desired output data file, you can click Download in the panel that appears. The following figure shows how to download the output data file.下载

    The computing result of the batch deployment is an .ext file. After you download the output data file, you can use Notepad or Microsoft Office Word to open the file. The following figure shows the computing result.result

(Optional) Step 6: Cancel the deployment

If you modify the deployment and the deployment is in the RUNNING or FINISHED state, you can cancel and then restart the deployment to make the modification take effect. If the deployment fails and cannot reuse the state data to recover, you must cancel and then restart the deployment. For more information about how to cancel a deployment, see Cancel a deployment.

References