Use Python dependencies

Pre-installed Python environment

The fully managed Flink environment comes with a pre-installed Python environment. The Python versions are as follows:

VVR 8.0.10 and earlier: Python 3.7
VVR 8.0.11 and later: Python 3.9

Note

For the pre-installed third-party packages in the Python environment, see Develop a Python draft.

Some third-party Python packages have glibc version requirements. The pre-installed glibc versions in the fully managed Flink environment are as follows:

X86

VVR 8.x and earlier: glibc 2.17
VVR 11.x and later: glibc 2.31

ARM

VVR 11.2 and earlier: glibc 2.17
VVR 11.3 and later: glibc 2.31

Note

Glibc supports forward compatibility. The glibc version required by the third-party Python packages you use must not be later than the glibc version in the environment.

Use a custom Python virtual environment

Note

In Ververica Runtime (VVR) 4.X, you can use only virtual environments of Python 3.7. In VVR 6.X or later, you can use virtual environments of later Python versions.

If the pre-installed Python environment does not meet your requirements, you can use a custom Python version through Python virtual environments. Each Python virtual environment provides a complete Python runtime environment. You can install a series of Python dependency packages in a virtual environment. The following section describes how to prepare a Python virtual environment.

Prepare a Python virtual environment.

Prepare the setup-pyflink-virtual-env.sh script on your local device. The following code shows the content of the script.

X86

set -e
# Download the miniforge.sh script.
wget "https://github.com/conda-forge/miniforge/releases/download/25.11.0-1/Miniforge3-25.11.0-1-Linux-x86_64.sh" -O "miniforge.sh"

# Add execution permissions to the miniforge.sh script.
chmod +x miniforge.sh

# Install miniforge.
./miniforge.sh -b
source /root/miniforge3/bin/activate

# Create a Python virtual environment.
mamba create -n venv python=3.10 -y
eval "$(mamba shell hook --shell bash)"

# Activate the Python virtual environment.
mamba activate venv

# Install the PyFlink dependency.
pip install "ververica-flink==11.7.0"
# For VVR 11.5 and earlier, no dedicated PyPI package is available. Install the open-source PyFlink instead:
# pip install "apache-flink==1.20.3" "setuptools<81"

# Remove unnecessary JAR files to reduce the package size.
find /root/miniforge3/envs/venv/lib/python3.10/site-packages/pyflink/ -name *.jar | xargs rm

# Deactivate the Conda Python virtual environment.
mamba deactivate

# Package the prepared Conda Python virtual environment.
cd /root/miniforge3/envs/ && zip -r /root/venv.zip venv

ARM

set -e
# Download the miniforge.sh script.
wget "https://github.com/conda-forge/miniforge/releases/download/25.11.0-1/Miniforge3-25.11.0-1-Linux-aarch64.sh" -O "miniforge.sh"

# Add execution permissions to the miniforge.sh script.
chmod +x miniforge.sh

# Install miniforge.
./miniforge.sh -b
source /root/miniforge3/bin/activate

# Create a Python virtual environment.
mamba create -n venv python=3.10 -y
eval "$(mamba shell hook --shell bash)"

# Activate the Python virtual environment.
mamba activate venv

# Install the PyFlink dependency.
yum install -y java-11-openjdk-devel
export JAVA_HOME=/usr/lib/jvm/java-11
wget "https://raw.githubusercontent.com/apache/flink/release-1.20/flink-python/dev/dev-requirements.txt" -O dev-requirements.txt
pip install -r dev-requirements.txt
pip install "ververica-flink==11.7.0"
# For VVR 11.5 and earlier, no dedicated PyPI package is available. Install the open-source PyFlink instead:
# pip install "apache-flink==1.20.3" "setuptools<81"

# Remove unnecessary JAR files to reduce the package size.
find /root/miniforge3/envs/venv/lib/python3.10/site-packages/pyflink/ -name *.jar | xargs rm

# Deactivate the Conda Python virtual environment.
mamba deactivate

# Package the prepared Conda Python virtual environment.
cd /root/miniforge3/envs && zip -r /root/venv.zip venv

Note

In this topic, the deployment uses VVR 11.7 and runs in a virtual environment of Python 3.10. If you want to use a different VVR version or install a virtual environment of another Python version, you must modify the following parameters:

mamba create: change this to your desired Python version.
pip install:
- VVR 11.6 and later: install ververica-flink and change the version to match the VVR version of your deployment.
- VVR 11.5 and earlier: install apache-flink and change the version to the Flink version that corresponds to the VVR version of your deployment. For more information about how to view the Flink version, see Storage management.

Prepare the build.sh script on your local device. The following code shows the content of the script.

#!/bin/bash
set -e -x

yum install -y zip wget

cd /root/
bash /build/setup-pyflink-virtual-env.sh
mv venv.zip /build/

In the CLI, run the following command to install the Python virtual environment:
X86
```
docker run -it --rm -v $PWD:/build -w /build quay.io/pypa/manylinux_2_28_x86_64 bash ./build.sh
```
ARM
```
docker run -it --rm -v $PWD:/build -w /build quay.io/pypa/manylinux_2_28_aarch64 bash ./build.sh
```
After you run the command, a file named venv.zip is generated. In this example, the virtual environment of Python 3.10 is used.

You can also modify the preceding script to install the required third-party Python package in the virtual environment.

Use the Python virtual environment in Python deployments.
1. Log on to the Realtime Compute for Apache Flink console.
2. On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
3. In the left-side navigation pane, click Artifacts. On the Artifacts page, click Upload Artifact. In the dialog box that appears, select the venv.zip package.
4. On the O&M > Deployments page, click the name of the desired job.
5. On the Configuration tab, click Edit in the upper-right corner of the Basic section and select the venv.zip package from the Python Archives drop-down list.
  
  If the deployment is an SQL deployment that needs to use Python user-defined functions (UDFs), click Edit in the upper-right corner of the Parameters section and add the following configuration to the Other Configuration field:
```
python.archives: oss://.../venv.zip
```
6. In the Parameters section, add the configuration information about the path for installing the specified Python virtual environment based on the VVR version of your deployment to the Other Configuration field.
  - VVR 6.X or later
```
python.executable: venv.zip/venv/bin/python
python.client.executable: venv.zip/venv/bin/python
```
  - An engine version earlier than VVR 6.X
```
python.executable: venv.zip/venv/bin/python
```

Use a third-party Python package

Note

Zip Safe, PyPI, and manylinux in the following description are provided at third-party websites. When you visit the websites, the websites may fail to be accessed or access to the websites may be delayed.

The following two scenarios show how to use a third-party Python package:

Use a third-party Python package that can be directly imported
If your third-party Python package is a Zip Safe package, you can perform the following steps to directly use the package in Python deployments without installation:
1. Download a third-party Python package that can be directly imported.
  1. Visit PyPI on your web browser.
  2. Enter the name of a third-party Python package, such as apache-flink 1.20.3, in the search box.
  3. In the search results, click the name of the package that you want to use.
  4. In the left-side navigation pane of the page that appears, click Download files.
  5. Click the name of the package whose name contains cp39-cp39-manylinux1 to download the package.
2. Log on to the Realtime Compute for Apache Flink console.
3. On the Streaming Compute Flink tab, find your workspace and click Console in the Actions column.
4. In the left navigation pane, click Artifacts. On the Artifacts page, click Upload Artifact. In the dialog box that appears, select the required third-party Python package.
5. In the left navigation pane, click O&M > Deployments. On the Deployments page, click Create Deployment > Python Deployment. In the dialog box, for Python Libraries, select the third-party Python package you upload.
6. Click Save.
Use a third-party Python package that requires compilation

If a third-party Python package meets the following conditions, the package must be compiled before it can be used: The third-party Python package is a compressed package in the tar.gz format or a source package that you downloaded from another location, and the setup.py file exists under the root directory of the compressed package. You must compile the third-party Python package in an environment that is compatible with Flink before you call the third-party Python package in a Python deployment.

We recommend that you use Python 3.9 in the quay.io/pypa/manylinux_2_28_x86_64 image to compile third-party Python packages. The packages generated by the image are compatible with most Linux operating systems. For more information about the image, see manylinux.

Note
Python 3.9 is installed in the /opt/python/cp39-cp39/bin/python3 directory.
The following example shows how to compile and use the third-party Python package opencv-python-headless.
1. Compile a third-party Python package.
  1. Prepare the requirements.txt file on your local device. The following code shows the content of the file:
```
opencv-python-headless
numpy<2
```
  2. Prepare the build.sh script on your local device. The following code shows the content of the script:
```
#!/bin/bash
set -e -x

yum install -y zip

#PYBIN=/opt/python/cp37-cp37m/bin
#PYBIN=/opt/python/cp38-cp38/bin
PYBIN=/opt/python/cp39-cp39/bin
#PYBIN=/opt/python/cp310-cp310/bin
#PYBIN=/opt/python/cp311-cp311/bin

"${PYBIN}/pip" install --target __pypackages__ -r requirements.txt
cd __pypackages__ && zip -r deps.zip . && mv deps.zip ../ && cd ..
rm -rf __pypackages__
```
  3. In the CLI, run the following command:
    X86
    
    docker run -it --rm -v $PWD:/build -w /build quay.io/pypa/manylinux_2_28_x86_64 bash ./build.sh
    
    ARM
    
    docker run -it --rm -v $PWD:/build -w /build quay.io/pypa/manylinux_2_28_aarch64 bash ./build.sh
    After you run the command, a file named deps.zip is generated. This file is the compiled third-party Python package.
    
    You can also modify the content of the requirements.txt file to install other required third-party Python packages. In addition, multiple Python dependencies can be specified in the requirements.txt file.
2. Use the third-party Python package deps.zip in Python deployments.
  1. Log on to the Realtime Compute for Apache Flink console.
  2. Find your workspace and click Console in the Actions column.
  3. In the left navigation pane, click Artifacts. On the Artifacts page, click Upload Artifact. In the dialog box, select deps.zip.
  4. On the O&M > Deployments page, click your deployment. On the Configuration tab, click Edit in the upper-right corner of the Basic section and select the deps.zip package from the Python Libraries drop-down list.
1. Click Save.

Use a JAR package

If you use Java classes, such as a connector or a Java UDF, in your Python deployment, you can perform the following operations to specify the JAR package of the connector or Java UDF.

Log on to the Realtime Compute for Apache Flink console.
On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
In the left navigation pane, click Artifacts. On the Artifacts page, click Upload Artifact. In the dialog box, select the JAR package that you want to use.
On the Deployments page, click the name of the desired deployment. On the Configuration tab, click Edit in the upper-right corner of the Basic section and select the required JAR package from the Additional Dependencies drop-down list.
On the Configuration tab, click Edit in the upper-right corner of the Parameters section and add the following configuration to the Other Configuration field.
For example, if the draft depends on the two JAR packages that are named jar1.jar and jar2.jar, add the following configuration information:
```
pipeline.classpaths: 'file:///flink/usrlib/jar1.jar;file:///flink/usrlib/jar2.jar'
```
Click Save.

Use built-in connectors, data formats, and catalogs

Note

Only VVR 11.2+ versions are supported.

To use built-in connectors, data formats, and catalogs in Python programs, do the following:

On your Python deployment's details page, in the Parameters section, add the configurations to the Other Configuration field.
Add the parameter for using built-in connectors. The following configuration specifies the Kafka and SLS connectors. For specific connector names, see documents under Supported connectors.
```
pipeline.used-builtin-connectors: kafka;sls
```
Add the parameter for built-in data formats. The following configuration specifies the avro and parquet formats. For specific data format names, see Data formats.
```
pipeline.used-builtin-formats: avro;parquet
```
Add the parameter for built-in catalogs. The following configuration specifies the hive-2.3.6 and paimon catalogs. For specific catalogs, see Catalogs.
```
pipeline.used-builtin-catalogs: hive-2.3.6;paimon
```
Click Save.

Use data files

Note

Fully managed Flink does not allow you to debug Python deployments by uploading data files.

The following scenarios show how to use data files:

Select a package from the Python Archives drop-down list
If you have a large number of data files, you can package the data files into a ZIP file and perform the following operations to use them in Python deployments:
1. Log on to the Realtime Compute for Apache Flink console.
2. Find your workspace and click Console in the Actions column.
3. In the left navigation pane, click Artifacts. On the Artifact page, click Upload Artifact. In the dialog box that appears, select the ZIP package of the desired data file.
4. On the O&M > Deployments page, click the name of your deployment. On the Configuration tab, click Edit in the upper-right corner of the Basic section and select the required ZIP package from the Python Archives drop-down list.
5. In Python UDFs, run the following command to access a data file. In this example, the name of the package that contains the data files is mydata.zip.
```
def map():
    with open("mydata.zip/mydata/data.txt") as f:
    ...
```
Select a data file from the Additional Dependencies drop-down list
If you have a small number of data files, you can perform the following operations to access these files in Python deployments:
1. Log on to the Realtime Compute for Apache Flink console.
2. Find your workspace and click Console in the Actions column.
3. In the left-side navigation pane, click Artifacts. On the Artifact page, click Upload Artifact. In the dialog box that appears, select the desired data file.
4. On the O&M > Deployments page, click the name of the desired deployment. On the Configuration tab, click Edit in the upper-right corner of the Basic section and select the required data file from the Additional Dependencies drop-down list.
5. In Python UDFs, run the following command to access a data file. In this example, the data file is named data.txt.
```
def map():
    with open("/flink/usrlib/data.txt") as f:
    ...
```

References

For more information about how to develop a Python API draft, see Develop PyFlink jobs.
For more information about how to develop a Python deployment of Realtime Compute for Apache Flink, see PyFlink job.
Fully managed Flink supports SQL drafts and DataStream drafts. For more information about how to develop SQL drafts and DataStream drafts, see Job development overview and Develop a JAR job.

Overview

Pre-installed Python environment

X86

ARM

Use a custom Python virtual environment

X86

ARM

X86

ARM

Use a third-party Python package

X86

ARM

Use a JAR package

Use built-in connectors, data formats, and catalogs

Use data files

References