All Products
Search
Document Center

Realtime Compute for Apache Flink:Use Python dependencies

Last Updated:Mar 26, 2026

Realtime Compute for Apache Flink supports several types of Python dependencies: custom Python virtual environments, third-party Python packages, JAR packages, built-in connectors and data formats, and data files.

Choose a dependency type

Dependency type When to use
Custom Python virtual environment The pre-installed Python version does not match your requirements, or you need a specific set of packages bundled into a single environment
Third-party Python packages (Zip Safe) Your package is Zip Safe and does not require compilation
Third-party Python packages (compiled) Your package is a .tar.gz or source package with a setup.py file
JAR packages Your Python job uses Java classes, such as connectors or Java user-defined functions (UDFs)
Built-in connectors, data formats, and Catalogs You need built-in connectors, data formats, or Catalogs (such as Kafka, SLS, Avro, Parquet, Hive, or Paimon) on VVR 11.2 or later
Data files Your Python UDF reads from local files at runtime

Pre-installed Python environment

The fully managed Flink environment comes with Python pre-installed:

Ververica Runtime (VVR) version Python version
VVR 8.0.10 and earlier Python 3.7
VVR 8.0.11 and later Python 3.9

For the list of pre-installed third-party packages, see Python job development.

GNU C Library (glibc) compatibility

Some Python packages require a minimum GNU C Library (glibc) version. The glibc version in the fully managed Flink environment depends on your architecture and VVR version:

X86

VVR version glibc version
VVR 8.x and earlier glibc 2.17
VVR 11.x and later glibc 2.31

ARM

VVR version glibc version
VVR 11.2 and earlier glibc 2.17
VVR 11.3 and later glibc 2.31
glibc is forward-compatible. The glibc version required by your dependency must not be later than the glibc version in the environment.

Use a custom Python virtual environment

If the pre-installed Python version does not meet your needs, package a custom Python virtual environment and upload it to your job.

VVR 4.x supports only Python 3.7 virtual environments. VVR 6.x and later have no version restriction.

Step 1: Build the virtual environment

The following scripts create a Python 3.10 virtual environment for a VVR 11.x job. To target a different VVR or Python version, adjust the mamba create Python version and the apache-flink version accordingly. To find the Flink version that corresponds to your VVR, see Workspace management and operations.

  1. On your local machine, create a file named setup-pyflink-virtual-env.sh

    X86

    set -e
    # Download the miniforge.sh script.
    wget "https://github.com/conda-forge/miniforge/releases/download/25.11.0-1/Miniforge3-25.11.0-1-Linux-x86_64.sh" -O "miniforge.sh"
    
    # Add execute permissions to the miniforge.sh script.
    chmod +x miniforge.sh
    
    # Install miniforge.
    ./miniforge.sh -b
    source /root/miniforge3/bin/activate
    
    # Create a Python virtual environment.
    mamba create -n venv python=3.10 -y
    eval "$(mamba shell hook --shell bash)"
    
    # Activate the Python virtual environment.
    mamba activate venv
    
    # Install the PyFlink dependency.
    # Update the PyFlink version if needed.
    pip install "apache-flink==1.20.3"
    
    # Remove unnecessary JAR packages to reduce the package size.
    find /root/miniforge3/envs/venv/lib/python3.10/site-packages/pyflink/ -name *.jar | xargs rm
    
    # Deactivate the Conda Python virtual environment.
    mamba deactivate
    
    # Package the prepared Conda Python virtual environment.
    cd /root/miniforge3/envs/ && zip -r /root/venv.zip venv

    ARM

    set -e
    # Download the miniforge.sh script.
    wget "https://github.com/conda-forge/miniforge/releases/download/25.11.0-1/Miniforge3-25.11.0-1-Linux-aarch64.sh" -O "miniforge.sh"
    
    # Add execute permissions to the miniforge.sh script.
    chmod +x miniforge.sh
    
    # Install miniforge.
    ./miniforge.sh -b
    source /root/miniforge3/bin/activate
    
    # Create a Python virtual environment.
    mamba create -n venv python=3.10 -y
    eval "$(mamba shell hook --shell bash)"
    
    # Activate the Python virtual environment.
    mamba activate venv
    
    # Install the PyFlink dependency.
    # Update the PyFlink version if needed.
    yum install -y java-11-openjdk-devel
    export JAVA_HOME=/usr/lib/jvm/java-11
    wget "https://raw.githubusercontent.com/apache/flink/release-1.20/flink-python/dev/dev-requirements.txt" -O dev-requirements.txt
    pip install -r dev-requirements.txt
    pip install "apache-flink==1.20.3"
    
    # Remove unnecessary JAR packages to reduce the package size.
    find /root/miniforge3/envs/venv/lib/python3.10/site-packages/pyflink/ -name *.jar | xargs rm
    
    # Deactivate the Conda Python virtual environment.
    mamba deactivate
    
    # Package the prepared Conda Python virtual environment.
    cd /root/miniforge3/envs && zip -r /root/venv.zip venv
  2. On your local machine, create a file named build.sh.

    #!/bin/bash
    set -e -x
    
    yum install -y zip wget
    
    cd /root/
    bash /build/setup-pyflink-virtual-env.sh
    mv venv.zip /build/
  3. Run the Docker build command to produce venv.zip

    X86

    docker run -it --rm -v $PWD:/build -w /build quay.io/pypa/manylinux_2_28_x86_64 bash ./build.sh

    ARM

    docker run -it --rm -v $PWD:/build -w /build quay.io/pypa/manylinux_2_28_aarch64 bash ./build.sh

    After the command completes, venv.zip is generated in your working directory. You can also modify setup-pyflink-virtual-env.sh to install additional third-party packages inside the virtual environment before zipping.

Step 2: Upload and configure the virtual environment

  1. Log on to the Realtime Compute for Apache Flink console.

  2. In the Actions column of the target workspace, click Console.

  3. In the left navigation pane, click File Management and upload venv.zip.

  4. Go to Operation Center > Job O&M and click the name of the target job.

  5. On the Deployment Details tab, in the Basic Configuration section, select venv.zip for Python Archives. For SQL jobs that use Python UDFs, also add the following configuration in Parameter Settings > Other Configuration. This makes the archive path available to the Python process at runtime — without it, the UDF cannot locate the virtual environment:

    python.archives: oss://.../venv.zip
  6. In Parameter Settings > Other Configuration, add the configuration that points to the Python executable inside the archive. VVR 6.x and later

    python.executable: venv.zip/venv/bin/python
    python.client.executable: venv.zip/venv/bin/python

    Earlier than VVR 6.x

    python.executable: venv.zip/venv/bin/python
  7. Click Save.

Use third-party Python packages

Zip Safe, PyPI, and manylinux are third-party websites. You may experience access failures or delays when visiting them.

Choose one of the following approaches based on your package type.

Use a Zip Safe package directly

If your package is Zip Safe, upload it directly without compilation.

  1. Download the package from PyPI:

    1. Search for the target package (for example, apache-flink 1.20.3).

    2. Click the package name in the search results.

    3. In the left navigation pane, click Download files.

    4. Download the file with cp39-cp39m-manylinux1 in the filename.

  2. Log on to the Realtime Compute for Apache Flink console.

  3. In the Actions column of the target workspace, click Console.

  4. In the left navigation pane, click File Management and upload the package.

  5. Go to Operation Center > Job O&M, click Deploy Job > Python Job, and select the uploaded package for Python Libraries.

  6. Click Save.

Use a package that requires compilation

If your package is a .tar.gz file or a source package with a setup.py in its root directory, compile it in a manylinux-compatible environment first. The quay.io/pypa/manylinux_2_28_x86_64 image produces packages compatible with most Linux environments.

The following example compiles opencv-python-headless.

Python 3.9 is installed at /opt/python/cp39-cp39/bin/python3 in the image.
  1. Compile the package.

    1. On your local machine, create a requirements.txt file:

      opencv-python-headless numpy<2
    2. On your local machine, create a build.sh script:

      #!/bin/bash set -e -x yum install -y zip #PYBIN=/opt/python/cp37-cp37m/bin #PYBIN=/opt/python/cp38-cp38/bin PYBIN=/opt/python/cp39-cp39/bin #PYBIN=/opt/python/cp310-cp310/bin #PYBIN=/opt/python/cp311-cp311/bin "${PYBIN}/pip" install --target __pypackages__ -r requirements.txt cd __pypackages__ && zip -r deps.zip . && mv deps.zip ../ && cd .. rm -rf __pypackages__ 
    3. Run the Docker build command:

      X86

      docker run -it --rm -v $PWD:/build -w /build quay.io/pypa/manylinux_2_28_x86_64 bash ./build.sh

      ARM

      docker run -it --rm -v $PWD:/build -w /build quay.io/pypa/manylinux_2_28_aarch64 bash ./build.sh

      After the command completes, deps.zip is generated. To include additional packages, add them to requirements.txt before running the build.

  2. Upload and configure deps.zip.

    1. Log on to the Realtime Compute for Apache Flink console.

    2. In the Actions column of the target workspace, click Console.

    3. In the left navigation pane, click Files and upload deps.zip.

    4. Go to Operation Center > Job O&M and click the name of the target job. On the Deployment Details tab, in the Basic Configuration section, click Edit and select deps.zip for Python Libraries.

    5. Click Save.

Use JAR packages

If your Python job uses Java classes — such as a connector or a Java UDF — specify the JAR package in the job configuration.

  1. Log on to the Realtime Compute for Apache Flink console.

  2. In the Actions column of the target workspace, click Console.

  3. In the left navigation pane, click Files and upload the JAR package.

  4. Go to Operation Center > Job O&M and click the name of the target job. On the Deployment Details tab, in the Basic Configuration section, click Edit and select the JAR package for Additional Dependencies.

  5. In Parameter Settings > Other Configuration, add the classpath configuration. For multiple JAR packages, separate the paths with semicolons:

    pipeline.classpaths: 'file:///flink/usrlib/jar1.jar;file:///flink/usrlib/jar2.jar'
  6. Click Save.

Use built-in connectors, data formats, and Catalogs

Built-in connectors, data formats, and Catalogs require VVR 11.2 or later.

To use built-in connectors, data formats, or Catalogs, declare them in Parameter Settings > Other Configuration using the corresponding configuration key. Separate multiple values with semicolons.

Connectors — for available connector names, see Supported connectors:

pipeline.used-builtin-connectors: kafka;sls

Data formats — for available format names, see Data formats:

pipeline.used-builtin-formats: avro;parquet

Catalogs — for available Catalog names, see Data Management:

pipeline.used-builtin-catalogs: hive-2.3.6;paimon

Click Save after adding the configuration.

Use data files

Flink does not support debugging Python jobs by uploading data files.

Choose one of the following approaches based on the number of files.

Package files as a ZIP archive (Python Archives)

For multiple data files, package them into a ZIP file and use Python Archives.

  1. Log on to the Realtime Compute for Apache Flink console.

  2. In the Actions column of the target workspace, click Console.

  3. In the left navigation pane, click File Management and upload the ZIP file.

  4. Go to Operation Center > Job O&M and click the name of the target job. On the Deployment Details tab, in the Basic Configuration section, click Edit and select the ZIP file for Python Archives.

  5. In your Python UDF, access the files using the archive name as the path prefix. For example, if the archive is named mydata.zip:

    def map():
        with open("mydata.zip/mydata/data.txt") as f:
            ...

Upload files individually (Additional Dependencies)

For a small number of data files, upload them individually using Additional Dependencies.

  1. Log on to the Realtime Compute for Apache Flink console.

  2. In the Actions column of the target workspace, click Console.

  3. In the left navigation pane, click Files and upload the data file.

  4. Go to Operation Center > Job O&M and click the name of the target job. On the Deployment Details tab, in the Basic Configuration section, click Edit and select the file for Additional Dependencies.

  5. In your Python UDF, access the file at /flink/usrlib/. For example, for a file named data.txt:

    def map():
        with open("/flink/usrlib/data.txt") as f:
            ...

Configuration reference

All dependency-related configuration keys are added in Parameter Settings > Other Configuration. Separate multiple values with semicolons.

Configuration key Description Example
python.archives Path to the virtual environment ZIP file in OSS. Required for SQL jobs using Python UDFs. oss://.../venv.zip
python.executable Path to the Python interpreter inside the archive. Required for all VVR versions. VVR 6.x and later also require python.client.executable. venv.zip/venv/bin/python
python.client.executable Path to the Python interpreter on the client side. Required for VVR 6.x and later. venv.zip/venv/bin/python
pipeline.classpaths Classpath entries for JAR packages. file:///flink/usrlib/jar1.jar;file:///flink/usrlib/jar2.jar
pipeline.used-builtin-connectors Built-in connectors to load (VVR 11.2+). kafka;sls
pipeline.used-builtin-formats Built-in data formats to load (VVR 11.2+). avro;parquet
pipeline.used-builtin-catalogs Built-in Catalogs to load (VVR 11.2+). hive-2.3.6;paimon

What's next