All Products
Search
Document Center

Data Lake Formation:What is Data Lake Formation?

Last Updated:Nov 27, 2025

Data Lake Formation (DLF) offers a unified, fully managed platform for data and metadata management, and storage. In addition, it provides data access control and storage analysis and optimization. DLF seamlessly integrates with multiple Alibaba Cloud's big data analytics engines to break down data silos. With DLF, you can quickly build and manage cloud-native data lakes and OpenLake solutions. By unifying metadata, lake table formats, and storage management, DLF significantly simplifies O&M for building and managing data lakes, enabling businesses to focus on business innovation and data insights.

Features

  • Unified metadata and storage

    Provides a single set of lakehouse metadata and storage shared among compute engines, allowing data to flow seamlessly between integrated products.

  • Unified permission management

    Offers a unified set of permission configurations for lakehouse tables. This lets you define permissions once and enforce them across all services.

  • Storage optimization

    Optimizes storage efficiency through strategies like file compaction, expired snapshot cleanup, expired partition cleanup, and orphaned file cleanup.

  • Comprehensive ecosystem

    Deeply integrates with Alibaba Cloud products, including stream and batch processing engines, for an out-of-the-box experience that enhances usability and simplifies operations.

Architecture

image
  • Metadata management: Use the console to view and manage the metadatabase in your data lake, and create new metadatabases to manage your metadata and integrate with third-party applications.

  • Permission management: Strengthen access control over your lakehouse data to ensure its security. DLF supports permission management at three levels: Catalog, database, and table.

  • Storage optimization: Supports lakehouse table optimization strategies such as file compaction, expired snapshot cleanup, expired partition cleanup, and orphaned file cleanup. These strategies reduce storage costs and improve query efficiency.

Benefits

  • Fully managed service: As a fully managed service, DLF offers unified Paimon metadata and storage management. It's available out-of-the-box, O&M-free, and supports the full data lifecycle.

  • Enterprise-level security: DLF provides dual control over APIs and data permissions across multiple abstraction levels, ensuring your data is secure and compliant.

  • Flexible optimization strategies: Supports flexible lakehouse table optimization strategies, including file compaction and data cleanup, to significantly improve access performance and lower storage costs.

  • Rich ecosystem: Built on a deep integration with Paimon, DLF provides a fully managed service for managing metadata and storage. It seamlessly connects with Alibaba Cloud's compute engines and AI products, forming a powerful ecosystem.

Use cases

Data lakehouse

A data lakehouse combines the benefits of a data warehouse and a data lake. This architecture handles diverse data types while delivering high-performance analytics. You can use a data lakehouse to process large volumes of historical and real-time data. The processed data can then serve as a shared resource, allowing different teams to access it on demand while maintaining robust data security.

Traditional big data use cases

DLF is ideal for traditional big data use cases, including data lake computing and analytics. Common applications include offline big data analysis, real-time analysis, machine learning, and log file analysis. Providing a unified metadata and storage service, DLF simplifies and accelerates building a data lake and governing your data.