Realtime Compute for Apache Flink supports several types of Python dependencies: custom Python virtual environments, third-party Python packages, JAR packages, built-in connectors and data formats, and data files.
Choose a dependency type
| Dependency type | When to use |
|---|---|
| Custom Python virtual environment | The pre-installed Python version does not match your requirements, or you need a specific set of packages bundled into a single environment |
| Third-party Python packages (Zip Safe) | Your package is Zip Safe and does not require compilation |
| Third-party Python packages (compiled) | Your package is a .tar.gz or source package with a setup.py file |
| JAR packages | Your Python job uses Java classes, such as connectors or Java user-defined functions (UDFs) |
| Built-in connectors, data formats, and Catalogs | You need built-in connectors, data formats, or Catalogs (such as Kafka, SLS, Avro, Parquet, Hive, or Paimon) on VVR 11.2 or later |
| Data files | Your Python UDF reads from local files at runtime |
Pre-installed Python environment
The fully managed Flink environment comes with Python pre-installed:
| Ververica Runtime (VVR) version | Python version |
|---|---|
| VVR 8.0.10 and earlier | Python 3.7 |
| VVR 8.0.11 and later | Python 3.9 |
For the list of pre-installed third-party packages, see Python job development.
GNU C Library (glibc) compatibility
Some Python packages require a minimum GNU C Library (glibc) version. The glibc version in the fully managed Flink environment depends on your architecture and VVR version:
X86
| VVR version | glibc version |
|---|---|
| VVR 8.x and earlier | glibc 2.17 |
| VVR 11.x and later | glibc 2.31 |
ARM
| VVR version | glibc version |
|---|---|
| VVR 11.2 and earlier | glibc 2.17 |
| VVR 11.3 and later | glibc 2.31 |
glibc is forward-compatible. The glibc version required by your dependency must not be later than the glibc version in the environment.
Use a custom Python virtual environment
If the pre-installed Python version does not meet your needs, package a custom Python virtual environment and upload it to your job.
VVR 4.x supports only Python 3.7 virtual environments. VVR 6.x and later have no version restriction.
Step 1: Build the virtual environment
The following scripts create a Python 3.10 virtual environment for a VVR 11.x job. To target a different VVR or Python version, adjust the mamba create Python version and the apache-flink version accordingly. To find the Flink version that corresponds to your VVR, see Workspace management and operations.
-
On your local machine, create a file named
setup-pyflink-virtual-env.shX86
set -e # Download the miniforge.sh script. wget "https://github.com/conda-forge/miniforge/releases/download/25.11.0-1/Miniforge3-25.11.0-1-Linux-x86_64.sh" -O "miniforge.sh" # Add execute permissions to the miniforge.sh script. chmod +x miniforge.sh # Install miniforge. ./miniforge.sh -b source /root/miniforge3/bin/activate # Create a Python virtual environment. mamba create -n venv python=3.10 -y eval "$(mamba shell hook --shell bash)" # Activate the Python virtual environment. mamba activate venv # Install the PyFlink dependency. # Update the PyFlink version if needed. pip install "apache-flink==1.20.3" # Remove unnecessary JAR packages to reduce the package size. find /root/miniforge3/envs/venv/lib/python3.10/site-packages/pyflink/ -name *.jar | xargs rm # Deactivate the Conda Python virtual environment. mamba deactivate # Package the prepared Conda Python virtual environment. cd /root/miniforge3/envs/ && zip -r /root/venv.zip venvARM
set -e # Download the miniforge.sh script. wget "https://github.com/conda-forge/miniforge/releases/download/25.11.0-1/Miniforge3-25.11.0-1-Linux-aarch64.sh" -O "miniforge.sh" # Add execute permissions to the miniforge.sh script. chmod +x miniforge.sh # Install miniforge. ./miniforge.sh -b source /root/miniforge3/bin/activate # Create a Python virtual environment. mamba create -n venv python=3.10 -y eval "$(mamba shell hook --shell bash)" # Activate the Python virtual environment. mamba activate venv # Install the PyFlink dependency. # Update the PyFlink version if needed. yum install -y java-11-openjdk-devel export JAVA_HOME=/usr/lib/jvm/java-11 wget "https://raw.githubusercontent.com/apache/flink/release-1.20/flink-python/dev/dev-requirements.txt" -O dev-requirements.txt pip install -r dev-requirements.txt pip install "apache-flink==1.20.3" # Remove unnecessary JAR packages to reduce the package size. find /root/miniforge3/envs/venv/lib/python3.10/site-packages/pyflink/ -name *.jar | xargs rm # Deactivate the Conda Python virtual environment. mamba deactivate # Package the prepared Conda Python virtual environment. cd /root/miniforge3/envs && zip -r /root/venv.zip venv -
On your local machine, create a file named
build.sh.#!/bin/bash set -e -x yum install -y zip wget cd /root/ bash /build/setup-pyflink-virtual-env.sh mv venv.zip /build/ -
Run the Docker build command to produce
venv.zipX86
docker run -it --rm -v $PWD:/build -w /build quay.io/pypa/manylinux_2_28_x86_64 bash ./build.shARM
docker run -it --rm -v $PWD:/build -w /build quay.io/pypa/manylinux_2_28_aarch64 bash ./build.shAfter the command completes,
venv.zipis generated in your working directory. You can also modifysetup-pyflink-virtual-env.shto install additional third-party packages inside the virtual environment before zipping.
Step 2: Upload and configure the virtual environment
-
Log on to the Realtime Compute for Apache Flink console.
-
In the Actions column of the target workspace, click Console.
-
In the left navigation pane, click File Management and upload
venv.zip. -
Go to Operation Center > Job O&M and click the name of the target job.
-
On the Deployment Details tab, in the Basic Configuration section, select
venv.zipfor Python Archives. For SQL jobs that use Python UDFs, also add the following configuration in Parameter Settings > Other Configuration. This makes the archive path available to the Python process at runtime — without it, the UDF cannot locate the virtual environment:python.archives: oss://.../venv.zip -
In Parameter Settings > Other Configuration, add the configuration that points to the Python executable inside the archive. VVR 6.x and later
python.executable: venv.zip/venv/bin/python python.client.executable: venv.zip/venv/bin/pythonEarlier than VVR 6.x
python.executable: venv.zip/venv/bin/python -
Click Save.
Use third-party Python packages
Zip Safe, PyPI, and manylinux are third-party websites. You may experience access failures or delays when visiting them.
Choose one of the following approaches based on your package type.
Use a Zip Safe package directly
If your package is Zip Safe, upload it directly without compilation.
-
Download the package from PyPI:
-
Search for the target package (for example,
apache-flink 1.20.3). -
Click the package name in the search results.
-
In the left navigation pane, click Download files.
-
Download the file with
cp39-cp39m-manylinux1in the filename.
-
-
Log on to the Realtime Compute for Apache Flink console.
-
In the Actions column of the target workspace, click Console.
-
In the left navigation pane, click File Management and upload the package.
-
Go to Operation Center > Job O&M, click Deploy Job > Python Job, and select the uploaded package for Python Libraries.
-
Click Save.
Use a package that requires compilation
If your package is a .tar.gz file or a source package with a setup.py in its root directory, compile it in a manylinux-compatible environment first. The quay.io/pypa/manylinux_2_28_x86_64 image produces packages compatible with most Linux environments.
The following example compiles opencv-python-headless.
Python 3.9 is installed at /opt/python/cp39-cp39/bin/python3 in the image.
-
Compile the package.
-
On your local machine, create a
requirements.txtfile:opencv-python-headless numpy<2 -
On your local machine, create a
build.shscript:#!/bin/bash set -e -x yum install -y zip #PYBIN=/opt/python/cp37-cp37m/bin #PYBIN=/opt/python/cp38-cp38/bin PYBIN=/opt/python/cp39-cp39/bin #PYBIN=/opt/python/cp310-cp310/bin #PYBIN=/opt/python/cp311-cp311/bin "${PYBIN}/pip" install --target __pypackages__ -r requirements.txt cd __pypackages__ && zip -r deps.zip . && mv deps.zip ../ && cd .. rm -rf __pypackages__ -
Run the Docker build command:
X86
docker run -it --rm -v $PWD:/build -w /build quay.io/pypa/manylinux_2_28_x86_64 bash ./build.shARM
docker run -it --rm -v $PWD:/build -w /build quay.io/pypa/manylinux_2_28_aarch64 bash ./build.shAfter the command completes,
deps.zipis generated. To include additional packages, add them torequirements.txtbefore running the build.
-
-
Upload and configure
deps.zip.-
Log on to the Realtime Compute for Apache Flink console.
-
In the Actions column of the target workspace, click Console.
-
In the left navigation pane, click Files and upload
deps.zip. -
Go to Operation Center > Job O&M and click the name of the target job. On the Deployment Details tab, in the Basic Configuration section, click Edit and select
deps.zipfor Python Libraries. -
Click Save.
-
Use JAR packages
If your Python job uses Java classes — such as a connector or a Java UDF — specify the JAR package in the job configuration.
-
Log on to the Realtime Compute for Apache Flink console.
-
In the Actions column of the target workspace, click Console.
-
In the left navigation pane, click Files and upload the JAR package.
-
Go to Operation Center > Job O&M and click the name of the target job. On the Deployment Details tab, in the Basic Configuration section, click Edit and select the JAR package for Additional Dependencies.
-
In Parameter Settings > Other Configuration, add the classpath configuration. For multiple JAR packages, separate the paths with semicolons:
pipeline.classpaths: 'file:///flink/usrlib/jar1.jar;file:///flink/usrlib/jar2.jar' -
Click Save.
Use built-in connectors, data formats, and Catalogs
Built-in connectors, data formats, and Catalogs require VVR 11.2 or later.
To use built-in connectors, data formats, or Catalogs, declare them in Parameter Settings > Other Configuration using the corresponding configuration key. Separate multiple values with semicolons.
Connectors — for available connector names, see Supported connectors:
pipeline.used-builtin-connectors: kafka;sls
Data formats — for available format names, see Data formats:
pipeline.used-builtin-formats: avro;parquet
Catalogs — for available Catalog names, see Data Management:
pipeline.used-builtin-catalogs: hive-2.3.6;paimon
Click Save after adding the configuration.
Use data files
Flink does not support debugging Python jobs by uploading data files.
Choose one of the following approaches based on the number of files.
Package files as a ZIP archive (Python Archives)
For multiple data files, package them into a ZIP file and use Python Archives.
-
Log on to the Realtime Compute for Apache Flink console.
-
In the Actions column of the target workspace, click Console.
-
In the left navigation pane, click File Management and upload the ZIP file.
-
Go to Operation Center > Job O&M and click the name of the target job. On the Deployment Details tab, in the Basic Configuration section, click Edit and select the ZIP file for Python Archives.
-
In your Python UDF, access the files using the archive name as the path prefix. For example, if the archive is named
mydata.zip:def map(): with open("mydata.zip/mydata/data.txt") as f: ...
Upload files individually (Additional Dependencies)
For a small number of data files, upload them individually using Additional Dependencies.
-
Log on to the Realtime Compute for Apache Flink console.
-
In the Actions column of the target workspace, click Console.
-
In the left navigation pane, click Files and upload the data file.
-
Go to Operation Center > Job O&M and click the name of the target job. On the Deployment Details tab, in the Basic Configuration section, click Edit and select the file for Additional Dependencies.
-
In your Python UDF, access the file at
/flink/usrlib/. For example, for a file nameddata.txt:def map(): with open("/flink/usrlib/data.txt") as f: ...
Configuration reference
All dependency-related configuration keys are added in Parameter Settings > Other Configuration. Separate multiple values with semicolons.
| Configuration key | Description | Example |
|---|---|---|
python.archives |
Path to the virtual environment ZIP file in OSS. Required for SQL jobs using Python UDFs. | oss://.../venv.zip |
python.executable |
Path to the Python interpreter inside the archive. Required for all VVR versions. VVR 6.x and later also require python.client.executable. |
venv.zip/venv/bin/python |
python.client.executable |
Path to the Python interpreter on the client side. Required for VVR 6.x and later. | venv.zip/venv/bin/python |
pipeline.classpaths |
Classpath entries for JAR packages. | file:///flink/usrlib/jar1.jar;file:///flink/usrlib/jar2.jar |
pipeline.used-builtin-connectors |
Built-in connectors to load (VVR 11.2+). | kafka;sls |
pipeline.used-builtin-formats |
Built-in data formats to load (VVR 11.2+). | avro;parquet |
pipeline.used-builtin-catalogs |
Built-in Catalogs to load (VVR 11.2+). | hive-2.3.6;paimon |
What's next
-
For the full Python job development workflow, see Python job development.
-
For an end-to-end example, see Quick start for Flink Python jobs.
-
For SQL and DataStream job development, see Job development map and JAR job development.