This topic describes how to get started with Data Lake Formation (DLF).
Prerequisites
You have Set up DLF.
NoteThe activation and authorization are performed only once when you first set up for DLF environment.
To manage catalogs as a RAM user, you must have the following permissions:
API permissions: You have been assigned the
AliyunDLFFullAccesspermission policy or a policy that contains catalog-related authorization actions. For more information, see RAM authorization action reference.Data permissions: You must have been granted the
super_administratororadminsystem role or a custom role that has catalog-related permissions. For more information, see Configure data permissions.
Create a catalog
Create a catalog based on your use case, data volumes, service reliability, and budget requirements.
Log on to the DLF console.
On the Catalogs page, click Create Catalog and configure the following parameters.
Configuration Item
Description
Catalog Name
Enter a unique name for the catalog.
Description
Enter a description for the catalog.
Storage Type
Fixed to Standard Storage.
Storage Redundancy Type
Select a redundancy policy for your data:
LRS (Locally Redundant Storage): (Default) Stores data in a single zone. If the zone is unavailable, data becomes inaccessible.
ZRS (Zone-Redundant Storage): Replicates data across multiple zones within a region for higher availability.
NoteYou cannot change the redundancy type from ZRS to LRS after the catalog is created.
ZRS provides higher data availability but also incurs higher costs.
Read and select Terms of Service, then click Create Catalog.
For more information, see Manage catalogs.
Ingest data into a data lakehouse
Use tools such as Flink CDC and DataWorks' Data Integration to sync raw data to your data lakehouse.
Analyze data in a data lakehouse
Use EMR Serverless Spark to run batch read and write operations, Realtime Compute for Apache Flink to stream read and write data, and EMR Serverless StarRocks to extract insights from data.