All Products
Search
Document Center

Data Lake Formation:What is Data Lake Formation

Last Updated:Jan 03, 2025

Alibaba Cloud Data Lake Formation (DLF) is a fully managed service that helps users quickly build cloud-based data lakes and lakehouse. This service provides customers with unified metadata management, unified permission and security management, and one-click data exploration capabilities. DLF can help users quickly complete the construction and management of cloud-native data lakes and lakehouse, seamlessly integrate with various compute engines, break data silos, and gain business insights.

Pricing

  • The data exploration, permission management, and lake management features of DLF are in the public preview free stage and are not billed.

  • The metadata management feature is billed on a pay-as-you-go basis. Metadata object storage of up to 1 million per month is free. Charges apply for quantities exceeding this limit. For more information, see Billing.

  • API requests of up to 1 million per month are free. Charges apply for quantities exceeding this limit. For more information, see Billing.

Architecture

  • Data Catalog: View and manage the data catalog in the data lake through the console.

  • Database tables and functions: View and manage database tables and function information in the data lake through the console. Operate metadata by CreateDatabase and CreateTable, and integrate into third-party application services. It supports multi-version management and can automatically generate metadata through metadata extraction.

  • Data permission management: Enhance data permission control on the lake to ensure data security. It supports permissions at five levels of granularity: data catalog, database, data table, data column, and function.

  • Data lake management: Provides analysis and optimization suggestions for data storage in the lake, strengthens data lifecycle management, optimizes usage costs, and facilitates data O&M.

  • Data exploration: Provides one-click data exploration capabilities, supports Spark 3.0 SQL syntax, can save historical queries, preview data, export results, and generate TPC-DS test datasets with one click.

Scenarios

Scenario 1: Building a cloud-based data lake

With DLF integrated with E-MapReduce and OSS, you can quickly build a cloud-based data lake.

Scenario 2: Building a data lakehouse architecture

With DLF integrated with MaxCompute, DataWorks, and E-MapReduce, you can quickly build a data lakehouse architecture.

Scenario 3: Building a fully managed lakehouse data architecture

With DLF integrated with Databricks and OSS, you can build a fully managed lakehouse data architecture on the cloud.

Scenario 4: Data analysis

You can quickly analyze and explore structured and semi-structured data within OSS by using metadata extraction and data exploration capabilities.