All Products
Search
Document Center

Data Lake Formation:Get started with DLF

Last Updated:Feb 27, 2026

This guide walks you through creating your first catalog in Data Lake Formation (DLF), then shows how to ingest and analyze data in your data lakehouse.

Prerequisites

Before you begin, make sure that you have:

  • Completed the DLF setup (activation and authorization, required only once)

  • (If using a RAM user) The following permissions:

    • API permissions: The AliyunDLFFullAccess permission policy, or a policy that includes catalog-related authorization actions. For details, see RAM authorization action reference.

    • Data permissions: The super_administrator or admin system role, or a custom role with catalog-related permissions. For details, see Configure data permissions.

Create a catalog

A catalog is the top-level container for organizing metadata in your data lakehouse. When you create a catalog, choose a storage redundancy type based on your use case, data volumes, availability, and budget requirements.

  1. Log on to the DLF console.

  2. On the Catalogs page, click Create Catalog.

  3. Configure the following parameters.

    ParameterDescription
    Catalog NameA unique name for the catalog.
    DescriptionA description of the catalog.
    Storage TypeFixed to Standard Storage.
    Storage Redundancy TypeThe redundancy policy for your data. See the following table for details.

    Storage redundancy options:

    OptionBehaviorDefault
    LRS (Locally Redundant Storage)Stores data in a single zone. If the zone becomes unavailable, data is inaccessible.Yes
    ZRS (Zone-Redundant Storage)Replicates data across multiple zones within a region for higher data availability. Incurs higher costs than LRS.No
    Important

    After a catalog is created, you cannot change the redundancy type from ZRS to LRS.

  4. Read and select the Terms of Service, then click Create Catalog.

For more information, see Manage catalogs.

Ingest data into your data lakehouse

After you create a catalog, sync raw data to your data lakehouse by using tools such as Flink CDC and DataWorks Data Integration.

Analyze data in your data lakehouse

Query and extract insights from your data lakehouse by using the following engines:

  • EMR Serverless Spark -- batch read and write operations

  • Realtime Compute for Apache Flink -- streaming read and write operations

  • EMR Serverless StarRocks -- extract insights from data

Next steps