All Products
Search
Document Center

Data Lake Formation:Data Catalog

Last Updated:Mar 26, 2026

A data catalog is the top-level metadata entity in Data Lake Formation (DLF). Each catalog can contain multiple databases and acts as an isolation boundary — binding different E-MapReduce (EMR) clusters to separate catalogs keeps their metadata invisible to each other.

Manage catalogs

Create a catalog

Note

The Location field only accepts Object Storage Service (OSS) paths. If your default storage path is not on OSS, leave this field blank.

  1. Log on to the Data Lake Formation console.

  2. In the left-side navigation pane, choose Metadata > Metadata.

  3. Click the Catalog List tab, and click New Catalog.

  4. Configure the following fields, and click OK.

    Field Required Description
    Catalog ID Yes A unique identifier for this catalog. Cannot be duplicated.
    Description No A description of the catalog.
    Location No The default storage path for this catalog. Only OSS paths are supported. Leave blank if your default storage is not OSS.

View catalogs

  1. In the left-side navigation pane, choose Metadata > Metadata.

  2. Click the Catalog List tab to see all catalogs.

Edit a catalog

Only Description and Location are editable.

  1. In the left-side navigation pane, choose Metadata > Metadata.

  2. Click the Catalog List tab.

  3. In the Actions column, click Edit.

  4. Update Description or Location, and click OK.

Delete a catalog

Warning

Deleting a catalog is irreversible. The data cannot be recovered.

  1. In the left-side navigation pane, choose Metadata > Metadata.

  2. Click the Catalog List tab.

  3. In the Actions column, click Delete.

  4. In the confirmation dialog box, click Delete.

Bind an EMR cluster to a catalog

Each EMR cluster reads metadata from the catalog specified in its compute engine configuration.

Warning

Switching to a different catalog causes all existing database and table references in the cluster to become invalid. Any running jobs that depend on those references will fail. Please fully consider the impact before switching.

The following table shows which engines require separate configuration and which inherit Hive settings automatically.

Engine Config file Needs separate config Version notes
Hive core-site.xml Yes
Spark hive-site.xml Yes EMR 5.6.0, 3.40.0, and earlier use Hive config
Presto hive.properties Yes Supported in EMR 5.8.0, 3.42.0, and later only
Impala No Uses Hive config automatically

Hive engine

  1. In the core-site.xml file of the Hive service, add the following configuration item. For more information, see Manage configuration items.

    Key Value
    dlf.catalog.id The Catalog ID of the DLF catalog
  2. Save and deploy the configuration.

    1. Click Save, and then click Deploy Client Configuration.

    2. In the dialog box, enter an Execution Reason and click OK.

  3. Restart the Hive service.

    1. On the Hive service configuration page, click More > Restart.

    2. In the dialog box, enter an Execution Reason and click OK.

    After a successful restart, the Hive service status changes to Healthy.

Spark engine

Modify the hive-site.xml file of the Spark service using the same steps as Hive engine.

Note

For EMR 5.6.0, 3.40.0, and earlier versions, Spark uses the Hive configuration automatically. No separate Spark configuration is needed.

Presto engine

Modify the hive.properties file of the Presto service using the same steps as Hive engine.

Note

Presto catalog binding is supported only in EMR 5.8.0, 3.42.0, and later versions.

Impala engine

No configuration changes are needed for Impala. It uses the Hive configuration automatically.