Get started with DLF - Data Lake Formation - Alibaba Cloud Documentation Center

This topic describes how to get started with Data Lake Formation (DLF).

Prerequisites

You have Set up DLF.
Note
The activation and authorization are performed only once when you first set up for DLF environment.
To manage catalogs as a RAM user, you must have the following permissions:
- API permissions: You have been assigned the AliyunDLFFullAccess permission policy or a policy that contains catalog-related authorization actions. For more information, see RAM authorization action reference.
- Data permissions: You must have been granted the super_administrator or admin system role or a custom role that has catalog-related permissions. For more information, see Configure data permissions.

Create a catalog

Create a catalog based on your use case, data volumes, service reliability, and budget requirements.

Log on to the DLF console.

On the Catalogs page, click Create Catalog and configure the following parameters.

Configuration Item	Description
Catalog Name	Enter a unique name for the catalog.
Description	Enter a description for the catalog.
Storage Type	Fixed to Standard Storage.
Storage Redundancy Type	Select a redundancy policy for your data: LRS (Locally Redundant Storage): (Default) Stores data in a single zone. If the zone is unavailable, data becomes inaccessible. ZRS (Zone-Redundant Storage): Replicates data across multiple zones within a region for higher availability. Note You cannot change the redundancy type from ZRS to LRS after the catalog is created. ZRS provides higher data availability but also incurs higher costs.

Read and select Terms of Service, then click Create Catalog.

For more information, see Manage catalogs.

Ingest data into a data lakehouse

Use tools such as Flink CDC and DataWorks' Data Integration to sync raw data to your data lakehouse.

Analyze data in a data lakehouse

Use EMR Serverless Spark to run batch read and write operations, Realtime Compute for Apache Flink to stream read and write data, and EMR Serverless StarRocks to extract insights from data.