OpenLake DLF 3.0 Omni Catalog Architecture Overview - OpenLake

Product	Feature	Description	Related Documentation
DLF	Commercialization (billing enabled + SLA)	DLF began commercial operations in late December 2025. After commercialization, billing is enabled and a Service-Level Agreement (SLA) is provided.	DLF commercialization announcement
DLF	Public preview instructions (free preview/how to enable)	Instructions on how to participate in the free public preview of DLF and how to enable the service.	DLF public preview instructions
DLF	DLF 3.0: Omni Catalog	DLF 3.0 features Omni Catalog for multi-engine access and unified metadata management.	Alibaba Cloud DLF 3.0: An intelligent, omni-modal data lakehouse management platform for the AI era
DLF	DLF 3.0: Omni-modal data lakehouse management (structured/unstructured)	Expands from structured data to unified management and administration of unstructured data such as text, images, audio, and video.	Alibaba Cloud DLF 3.0: An intelligent, omni-modal data lakehouse management platform for the AI era
DLF	DLF 3.0: Intelligent storage and performance optimization (for AI workloads)	Features data organization, storage, and performance optimization capabilities for AI workloads.	Alibaba Cloud DLF 3.0: An intelligent, omni-modal data lakehouse management platform for the AI era
Realtime Compute for Apache Flink	Access DLF using Flink SQL (Paimon REST)	Connect Flink SQL to a DLF Catalog using a Paimon REST Catalog.	Access DLF using Flink SQL
Realtime Compute for Apache Flink	Access DLF using Flink DataStream (Paimon REST)	Access a DLF Catalog from Flink DataStream jobs using a Paimon REST Catalog.	Access DLF using Flink DataStream
Realtime Compute for Apache Flink	Access DLF using Flink SQL (Iceberg REST)	Connect Flink SQL to a DLF Catalog for Iceberg tables using an Iceberg REST Catalog.	Access DLF using Flink SQL
EMR Serverless Spark	Use a DLF Catalog in EMR Serverless Spark	Configure and use a DLF Catalog as the metadata catalog in EMR Serverless Spark.
EMR Serverless Spark	Use a DLF Catalog in OpenSearch	Use a DLF Catalog in OpenSearch for Data Lake Formation, synchronization, and indexing.
DataWorks	OpenData: Unified management of objects such as metadata, instances, and members	Use OpenData as a unified entry point to manage objects such as metadata, instances, and members in a workspace.
DataWorks	OpenData: Feature overview	Outlines the capabilities and usage of OpenData.	Manage open data
DataWorks	OpenData: Table schema and object field descriptions	Provides descriptions of OpenData table schemas and fields to facilitate integration and custom development.	Manage open data
DataWorks	OpenLake quick start (DLF-based)	A quick start guide for the OpenLake solution, which features unified metadata in DLF and multi-engine integration.
DataWorks	EMR Serverless Spark environment preparation: Select the DLF metadata service	When you create or configure a Serverless Spark environment in DataWorks, you can select DLF as the metadata service (DLF Catalog).	Prepare an environment
DataWorks	Offline sync task: Field vectorization (embedding)	Vectorize fields in the synchronization pipeline to generate vector fields for downstream vector retrieval or knowledge bases.	Vectorization
DataWorks	Node: PAI Flow node (scheduling/orchestration)	Orchestrate and schedule PAI Flow workflows in DataWorks.	PAI Flow node
DataWorks	Data source: Milvus (vector database read/write/sync)	DataWorks supports Milvus as a data source for reading, writing, and synchronizing vector data.	Milvus
EMR Serverless Spark	Data catalog: Add HMS, DLF-Legacy, and DLF 3.0 Catalogs at the same time	The platform-level data catalog is enhanced to support adding multiple types of catalogs at the same time. This facilitates unified metadata and cross-system access.	Version of 2025-11-12
EMR Serverless Spark	Lake table read/write: DLF table read/write optimization	The engine layer is optimized for reading from and writing to DLF tables to improve the lake table access experience.	Version of 2025-11-12
EMR Serverless Spark	Storage access: Passwordless access to pvfs	Supports passwordless access to pvfs, which facilitates integrated access on the lake storage side.	Version of 2025-11-12
EMR Serverless Spark	Lake format: Added/Enhanced support for the Lance file format	Adds support for the Lance file format for scenarios involving AI and vector data.	Version of 2025-11-12
EMR Serverless Spark	Paimon: Optimization and lineage enhancement	Optimizes Paimon and enhances its lineage capabilities. You can view Resilient Distributed Dataset (RDD) lineage and other information in DataWorks.	Version of 2025-11-12
EMR Serverless Spark	Kyuubi Gateway: Associate authorization tokens with a DLF 3.0 Catalog	The gateway layer supports associating authorization tokens with a DLF 2.5 Catalog for unified user authentication.	Version of 2025-09-17
EMR Serverless Spark	DLF 2.5: Full support for PaimonCatalog and IcebergCatalog	DLF 2.5 Catalog is compatible with PaimonCatalog and IcebergCatalog.	Version of 2025-09-17
EMR Serverless Spark	DLF Lance tables: New support	Adds support for DLF Lance tables in DLF 2.5 Catalog scenarios.	Version of 2025-09-17
EMR Serverless Spark	Workspace: Support for adding multiple DLF Catalogs for federated queries	The workspace level supports adding multiple DLF (formerly DLF 2.5) Catalogs to allow federated queries.	Version of 2025-07-31
EMR Serverless Spark	Livy Gateway: Read from DLF Catalog by default	Livy Gateway reads metadata from a DLF Catalog by default. This allows jobs submitted through Livy to directly access lake tables.	Version of 2025-07-31
EMR Serverless Spark	Manage data catalogs (HMS / DLF 1.0 / DLF 2.5)	Manage data catalogs in the console. Entry points are provided for operations such as adding, viewing, and deleting catalogs.	Manage data catalogs
EMR Serverless StarRocks	V1.19: Associate with the DLF data lake service when creating an instance	You can associate an instance with the DLF data lake service when you create the instance. This enables metadata and permission linkage in the data lakehouse.	Console release notes
Realtime Compute for Apache Flink	Manage Paimon Catalogs (can connect to DLF)	Create, view, and delete Paimon Catalogs. You can directly access Paimon tables in DLF.	Manage Paimon Catalogs
Hologres	Access Paimon Catalogs based on DLF 2.0	Access and manage Paimon Catalogs (such as External Database) through the DLF REST metastore.	Access Paimon Catalogs based on DLF
Hologres	DLF_FDW: Read from and write to OSS (data lake acceleration)	Read from and write to an OSS data lake through a dlf_fdw foreign table.	Accelerate access to an OSS data lake based on DLF
Hologres	Serverless Lakehouse: Paimon-based solution	Instructions on how to build a Serverless Lakehouse solution using Hologres and Paimon.	Hologres Serverless data lake solution based on Paimon
OpenSearch	Vector Search Edition: Data Lake Formation (DLF)	Guidelines on how to build vector indexes and perform retrieval from data lake tables such as DLF and Paimon tables.	Synchronize vector data from OpenLake-DLF to Alibaba Cloud OpenSearch
OpenSearch	Retrieval Engine Edition: Data Lake Formation (DLF)	Guidelines on how to build indexes and perform retrieval from data lake tables such as DLF and Paimon tables.	Data Lake Formation (DLF)
DataWorks	DataStudio (new version): Serverless StarRocks node	Provides a Serverless StarRocks node (including an SQL node) in DataWorks to develop and schedule StarRocks jobs and SQL.	Serverless StarRocks
EMR Serverless StarRocks	Associate with the DLF data lake service when creating an instance	You can associate an instance with the DLF data lake service when you create the instance. This enables metadata and permission linkage in the data lakehouse.	Console release notes
EMR Serverless StarRocks	StarRocks Manager supports associating users with RAM Roles	StarRocks users can be associated with RAM Roles to adapt to DLF's RAM Role-based access control and data lakehouse permission linkage.	Console release notes
EMR Serverless StarRocks	Kernel 3.3.13-1.2.0 (2025-10-29): DLF Iceberg Catalog support	New in Lakehouse: Supports DLF Iceberg Catalog, which allows StarRocks to directly connect to the Iceberg metadata catalog managed by DLF.	Kernel release notes
EMR Serverless StarRocks	Kernel 3.3.13-1.2.0 (2025-10-29): Paimon Native Writer	New in Lakehouse: Supports Paimon Native Writer for enhanced write performance in data lakehouse processing. This can be used with DLF and Paimon catalogs.	Kernel release notes
EMR Serverless StarRocks	Kernel 3.3.20-1.3.0 (2025-12-10): Paimon DV V2 / Native Reader	New in Lakehouse: Supports Paimon Deletion Vector V2. Supports Native Reader for reading from and writing to Paimon Format Tables to enhance lake table read/write capabilities.	Kernel release notes
MaxCompute	DLF + OSS external schema: Host foreign table metadata in DLF	Manage metadata such as schemas and tables on OSS through DLF. MaxCompute queries OSS foreign tables through an External Schema. This feature requires DLF 2.6 or later.	DLF+OSS external schema