All Products
Search
Document Center

Data Lake Formation:Get started with DLF

Last Updated:Nov 05, 2025

This topic describes how to get started with Data Lake Formation (DLF).

Prerequisites

  • You have Set up DLF.

    Note

    The activation and authorization are performed only once when you first set up for DLF environment.

  • To manage catalogs as a RAM user, you must have the following permissions:

    • API permissions: You have been assigned the AliyunDLFFullAccess permission policy or a policy that contains catalog-related authorization actions. For more information, see RAM authorization action reference.

    • Data permissions: You must have been granted the super_administrator or admin system role or a custom role that has catalog-related permissions. For more information, see Configure data permissions.

Create a catalog

Create a catalog based on your use case, data volumes, service reliability, and budget requirements.

  1. Log on to the DLF console.

  2. On the Catalogs page, click Create Catalog and configure the following parameters.

    Configuration Item

    Description

    Catalog Name

    Enter a unique name for the catalog.

    Description

    Enter a description for the catalog.

    Storage Type

    Fixed to Standard Storage.

    Storage Redundancy Type

    Select a redundancy policy for your data:

    • LRS (Locally Redundant Storage): (Default) Stores data in a single zone. If the zone is unavailable, data becomes inaccessible.

    • ZRS (Zone-Redundant Storage): Replicates data across multiple zones within a region for higher availability.

    Note
    • You cannot change the redundancy type from ZRS to LRS after the catalog is created.

    • ZRS provides higher data availability but also incurs higher costs.

  3. Read and select Terms of Service, then click Create Catalog.

For more information, see Manage catalogs.

Ingest data into a data lakehouse

Use tools such as Flink CDC and DataWorks' Data Integration to sync raw data to your data lakehouse.

Analyze data in a data lakehouse

Use EMR Serverless Spark to run batch read and write operations, Realtime Compute for Apache Flink to stream read and write data, and EMR Serverless StarRocks to extract insights from data.