PVFS - Data Lake Formation - Alibaba Cloud Documentation Center

Overview

The Paimon client supports PVFS (Paimon Virtual Storage) that allows users to access tables in a Data Lake Formation (DLF) catalog using standard file paths, similar to a regular file system.

How it works

PVFS abstracts the metadata and storage structure of Paimon tables into a unified file path format, such as pvfs://<catalog_name>/<database_name>/<table_name>/.... You can use this path to directly read the underlying content of a table, including snapshots, data files, and metadata, without requiring a compute engine such as Flink or Spark.

SDK support

PVFS supports the following two SDKs:

Java SDK: Implements the HDFS interface and seamlessly integrates with the Hadoop ecosystem, such as Hive, Spark, and Presto.
Python SDK: Based on Filesystem Spec (fsspec) and compatible with mainstream Python data tools, such as Dask, Pandas, and PyArrow.

PVFS allows developers and data engineers to easily explore, test, and manage Paimon tables in local or script environments. This greatly improves data lake development and operations and maintenance (O&M) efficiency.

Access control

PVFS uses the unified access control policies of DLF to manage read and write permissions for files at the table level. For more information, see Configure permissions.