DLF Big Data Engine Integration Architecture Overview - Data Lake Formation

As the unified data lake foundation of Alibaba Cloud, Data Lake Formation (DLF) integrates with mainstream big data compute engines. This provides powerful support for diverse business scenarios, such as real-time and offline data lakehouses, and Online Analytical Processing (OLAP). DLF is deeply integrated with core engines, including real-time computing Flink (VVP), EMR Serverless Spark, EMR Serverless StarRocks, and EMR on ECS. It also continuously expands its ecosystem compatibility.

Integration methods

DLF provides the following three standard integration methods to offer flexible data access for different engines and users:

Paimon REST: For compute engines built on Apache Paimon, this method provides a RESTful metadata service interface that complies with Paimon community standards. It supports core operations, such as table schema management and snapshot queries.
Iceberg REST: For compute engines built on Apache Iceberg, this method provides a RESTful metadata service interface that complies with Iceberg community standards. It supports core operations, such as table schema management and snapshot queries.
File access: This method uses the Paimon Virtual File System (PVFS) to abstract table data into standard file paths. This lets you directly read the underlying data files and metadata without needing a full compute engine. It is suitable for scripted exploration, debugging, and lightweight data processing.

These three methods allow you to choose the most suitable access path based on your engine's technology stack and architectural preferences. This ensures efficient integration with the DLF data lake.