This guide walks you through creating your first catalog in Data Lake Formation (DLF), then shows how to ingest and analyze data in your data lakehouse.
Prerequisites
Before you begin, make sure that you have:
Completed the DLF setup (activation and authorization, required only once)
(If using a RAM user) The following permissions:
API permissions: The
AliyunDLFFullAccesspermission policy, or a policy that includes catalog-related authorization actions. For details, see RAM authorization action reference.Data permissions: The
super_administratororadminsystem role, or a custom role with catalog-related permissions. For details, see Configure data permissions.
Create a catalog
A catalog is the top-level container for organizing metadata in your data lakehouse. When you create a catalog, choose a storage redundancy type based on your use case, data volumes, availability, and budget requirements.
Log on to the DLF console.
On the Catalogs page, click Create Catalog.
Configure the following parameters.
Parameter Description Catalog Name A unique name for the catalog. Description A description of the catalog. Storage Type Fixed to Standard Storage. Storage Redundancy Type The redundancy policy for your data. See the following table for details. Storage redundancy options:
Option Behavior Default LRS (Locally Redundant Storage) Stores data in a single zone. If the zone becomes unavailable, data is inaccessible. Yes ZRS (Zone-Redundant Storage) Replicates data across multiple zones within a region for higher data availability. Incurs higher costs than LRS. No ImportantAfter a catalog is created, you cannot change the redundancy type from ZRS to LRS.
Read and select the Terms of Service, then click Create Catalog.
For more information, see Manage catalogs.
Ingest data into your data lakehouse
After you create a catalog, sync raw data to your data lakehouse by using tools such as Flink CDC and DataWorks Data Integration.
Analyze data in your data lakehouse
Query and extract insights from your data lakehouse by using the following engines:
EMR Serverless Spark -- batch read and write operations
Realtime Compute for Apache Flink -- streaming read and write operations
EMR Serverless StarRocks -- extract insights from data
Next steps
Manage catalogs -- create, modify, and delete catalogs
Configure data permissions -- set up roles and access control
RAM authorization action reference -- review available permission actions