After you configure a Data Lake Formation (DLF) catalog, you can access the tables of a DLF instance in the console of fully managed Flink. This topic describes how to configure, view, and delete a DLF catalog in the console of fully managed Flink.

Background information

Alibaba Cloud DLF is a unified metadata management service that is provided by Alibaba Cloud. You can use DLF to manage tables that are in open source formats, such as Iceberg, Hudi, Delta, Parquet, ORC, and Avro. The common compute engines of Alibaba Cloud E-MapReduce, such as Apache Spark, Apache Flink, Apache Hive, and Apache Presto, can be integrated with DLF.

This topic describes the operations that you can perform to manage DLF catalogs:

Prerequisites

The Alibaba Cloud DLF service is activated.

Limits

  • Only the Flink compute engine of vvr-4.0.12-flink-1.13 or later supports DLF catalogs.
  • Fully managed Flink can manage only the Iceberg and Hudi data lake formats in DLF catalogs.

Configure a DLF catalog

You can configure a DLF catalog on the UI or by executing an SQL statement. We recommend that you configure a DLF catalog on the UI.

Configure a DLF catalog on the UI

  1. Log on to the Realtime Compute for Apache Flink console.
  2. Go to the Create Catalog dialog box.
    1. On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
    2. In the left-side navigation pane, click Draft Editor.
    3. On the left side of the Draft Editor page, click the Schemas tab.
    4. Click the Create icon and select Create Catalog from the drop-down list.
  3. Create a DLF catalog.
    1. In the Create Catalog dialog box, click DLF.
    2. Configure the parameters.
      DLF Catalog
      Parameter Description Required Remarks
      catalogname The name of the DLF catalog. Yes Set the value to a custom name.
      access.key.id The AccessKey ID of your Alibaba Cloud account. Yes For more information about how to obtain the AccessKey ID, see Obtain an AccessKey pair.
      access.key.secret The AccessKey secret of your Alibaba Cloud account. Yes For more information about how to obtain the AccessKey secret, see Obtain an AccessKey pair.
      warehouse The default OSS path for tables in the DLF catalog. Yes The path must be in the oss://<bucket>/<object> format. Parameters in the path:
      • bucket: indicates the name of the OSS bucket that you created.
      • object: indicates the path in which your data is stored.
      Note Log on to the OSS console to view your bucket name and object name.
      oss.endpoint The endpoint of OSS. Yes For more information, see Regions and endpoints.
      Note
      dlf.endpoint The endpoint of the DLF service. Yes
      Note
      dlf.region-id The ID of the region in which the DLF service resides. Yes
      Note Make sure that the region you selected matches the endpoint you selected for dlf.endpoint.
    3. Click Ok.
  4. Click the Refresh icon icon to refresh the page and view the DLF catalog that you created.
    Refresh icon

Configure a DLF catalog by executing an SQL statement

  1. Log on to the Realtime Compute for Apache Flink console.
  2. On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
  3. In the left-side navigation pane, click Draft Editor.
  4. In the upper-left corner of the Draft Editor page, click New. In the New Draft dialog box, select STREAM / SQL from the Type drop-down list.
  5. In the script editor, enter the following statement to create a DLF catalog:
    CREATE CATALOG dlf WITH (
       'type' = 'dlf',
       'access.key.id' = '<YourAliyunAccessKeyId>',
       'access.key.secret' = '<YourAliyunAccessKeySecret>',
       'warehouse' = '<YourAliyunOSSLocation>',
       'oss.endpoint' = '<YourAliyunOSSEndpoint>',
       'dlf.region-id' = '<YourAliyunDLFRegionId>',
       'dlf.endpoint' = '<YourAliyunDLFEndpoint>'
    );
    Parameter Description Required Remarks
    catalogname The name of the DLF catalog. Yes Set the value to a custom name.
    type The type of the catalog. Yes Set the value to dlf.
    access.key.id The AccessKey ID of your Alibaba Cloud account. Yes For more information about how to obtain the AccessKey ID, see Obtain an AccessKey pair.
    access.key.secret The AccessKey secret of your Alibaba Cloud account. Yes For more information about how to obtain the AccessKey secret, see Obtain an AccessKey pair.
    warehouse The default OSS path for tables in the DLF catalog. Yes The path must be in the oss://<bucket>/<object> format. Parameters in the path:
    • bucket: indicates the name of the OSS bucket that you created.
    • object: indicates the path in which your data is stored.
    Note Log on to the OSS console to view your bucket name and object name.
    oss.endpoint The endpoint of OSS. Yes For more information, see Regions and endpoints.
    Note
    dlf.endpoint The endpoint of the DLF service. Yes
    Note
    dlf.region-id The ID of the region in which the DLF service resides. Yes
    Note Make sure that the region you selected matches the endpoint you selected for dlf.endpoint.
  6. Click Execute.
    After the statement is executed, the message "Query has been executed" appears.
  7. On the left side of the Draft Editor page, click the Schemas tab.
  8. Click the Refresh icon icon to refresh the page and view the DLF catalog that you created.
    Refresh icon

View the metadata of the DLF catalog

After you configure the DLF catalog, you can perform the following steps to view the DLF metadata.

  1. Log on to the Realtime Compute for Apache Flink console.
  2. On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
  3. In the left-side navigation pane, click Draft Editor.
  4. On the left side of the Draft Editor page, click the Schemas tab.
  5. Select the DLF catalog whose metadata you want to view from the drop-down list in the menu bar. In this example, the DLF catalog named dlf is used.
    Refresh icon
  6. View information about databases, tables, and functions in the DLF catalog.
    Table name

Use the DLF catalog

  • Create a database and tables.
    1. On the Draft Editor page, create a streaming SQL job and write the following code:
      CREATE DATABASE dlf.dlf_testdb;
      
      CREATE TABLE dlf.dlf_testdb.iceberg (
        id    BIGINT,
        data  STRING,
        dt    STRING
      ) PARTITIONED BY (dt) WITH(
        'connector' = 'iceberg'
      );
      
      CREATE TABLE dlf.dlf_testdb.hudi (
        id    BIGINT PRIMARY KEY NOT ENFORCED,
        data  STRING,
        dt    STRING
      ) PARTITIONED BY (dt) WITH(
        'connector' = 'hudi'
      );
    2. Select three DDL statements in sequence and click Execute to create the dlf_testdb database, the dlf_testdb.iceberg table, and the dlf_testdb.hudi table.
      Note
      • You can create databases and tables on a session cluster of fully managed Flink that you create.
      • After you create the database and tables, you can view the database and tables on the Schemas tab of the Draft Editor page.
  • Insert data into the tables.
    1. On the Draft Editor page, create a streaming SQL job and write the following code:
      INSERT INTO dlf.dlf_testdb.iceberg VALUES (1, 'AAA', '2022-02-01'), (2, 'BBB', '2022-02-01');
      INSERT INTO dlf.dlf_testdb.hudi VALUES (1, 'AAA', '2022-02-01'), (2, 'BBB', '2022-02-01');
    2. Click Validate to verify the SQL syntax.
    3. Click Publish.
  • Read data from the tables.
    1. On the Draft Editor page, create a streaming SQL job and write the following code:
      SELECT * FROM dlf.dlf_testdb.iceberg LIMIT 2;
      SELECT * FROM dlf.dlf_testdb.hudi LIMIT 2;
    2. Click Validate to verify the SQL syntax.
    3. Click Execute. After the SQL statement is executed, you can view the returned data records in the console of fully managed Flink.

Delete a DLF catalog

Note After you delete a DLF catalog, the jobs that are running are not affected. However, the jobs that use a table of the catalog can no longer find the table if the jobs are published or restarted. Proceed with caution when you delete a DLF catalog.

You can delete a DLF catalog on the UI or by executing an SQL statement. We recommend that you delete a DLF catalog on the UI.

Delete a DLF catalog on the UI

  1. Log on to the Realtime Compute for Apache Flink console.
  2. On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
  3. In the left-side navigation pane, click Draft Editor.
  4. On the left side of the Draft Editor page, click the Schemas tab.
  5. Select the DLF catalog that you want to delete from the drop-down list in the menu bar, and click the Delete icon.
    Delete button
  6. In the dialog box that appears, click Delete.
  7. Click the Refresh icon icon to refresh the page and check whether the DLF catalog is deleted.

Delete a DLF catalog by executing an SQL statement

  1. Log on to the Realtime Compute for Apache Flink console.
  2. On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
  3. In the left-side navigation pane, click Draft Editor.
  4. In the upper-left corner of the Draft Editor page, click New. In the New Draft dialog box, select STREAM / SQL from the Type drop-down list.
  5. In the script editor, enter the following statement:
    DROP CATALOG ${catalog_name}

    catalog_name is the name of the DLF catalog that you want to delete in the console of fully managed Flink.

  6. Click Execute.
  7. On the left side of the Draft Editor page, click the Schemas tab.
  8. Click the Refresh icon icon to refresh the page and check whether the DLF catalog is deleted.