After you configure a Data Lake Formation (DLF) catalog, you can access the tables
of a DLF instance in the console of fully managed Flink. This topic describes how
to configure, view, and delete a DLF catalog in the console of fully managed Flink.
Background information
Alibaba Cloud DLF is a unified metadata management service that is provided by Alibaba
Cloud. You can use DLF to manage tables that are in open source formats, such as Iceberg,
Hudi, Delta, Parquet, ORC, and Avro. The common compute engines of Alibaba Cloud E-MapReduce,
such as Apache Spark, Apache Flink, Apache Hive, and Apache Presto, can be integrated
with DLF.
This topic describes the operations that you can perform to manage DLF catalogs:
Prerequisites
The Alibaba Cloud DLF service is activated.
Limits
- Only the Flink compute engine of vvr-4.0.12-flink-1.13 or later supports DLF catalogs.
- Fully managed Flink can manage only the Iceberg and Hudi data lake formats in DLF
catalogs.
Configure a DLF catalog
You can configure a DLF catalog on the UI or by executing an SQL statement. We recommend
that you configure a DLF catalog on the UI.
Configure a DLF catalog on the UI
- Log on to the Realtime Compute for Apache Flink console.
- Go to the Create Catalog dialog box.
- On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
- In the left-side navigation pane, click Draft Editor.
- On the left side of the Draft Editor page, click the Schemas tab.
- Click the
icon and select Create Catalog from the drop-down list.
- Create a DLF catalog.
- In the Create Catalog dialog box, click DLF.
- Configure the parameters.

Parameter |
Description |
Required |
Remarks |
catalogname |
The name of the DLF catalog. |
Yes |
Set the value to a custom name. |
access.key.id |
The AccessKey ID of your Alibaba Cloud account. |
Yes |
For more information about how to obtain the AccessKey ID, see Obtain an AccessKey pair.
|
access.key.secret |
The AccessKey secret of your Alibaba Cloud account. |
Yes |
For more information about how to obtain the AccessKey secret, see Obtain an AccessKey pair.
|
warehouse |
The default OSS path for tables in the DLF catalog. |
Yes |
The path must be in the oss://<bucket>/<object> format. Parameters in the path:
- bucket: indicates the name of the OSS bucket that you created.
- object: indicates the path in which your data is stored.
Note Log on to the OSS console to view your bucket name and object name.
|
oss.endpoint |
The endpoint of OSS. |
Yes |
For more information, see Regions and endpoints.
|
dlf.endpoint |
The endpoint of the DLF service. |
Yes |
|
dlf.region-id |
The ID of the region in which the DLF service resides. |
Yes |
Note Make sure that the region you selected matches the endpoint you selected for dlf.endpoint.
|
- Click Ok.
- Click the
icon to refresh the page and view the DLF catalog that you created.
Configure a DLF catalog by executing an SQL statement
- Log on to the Realtime Compute for Apache Flink console.
- On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
- In the left-side navigation pane, click Draft Editor.
- In the upper-left corner of the Draft Editor page, click New. In the New Draft dialog box, select STREAM / SQL from the Type drop-down list.
- In the script editor, enter the following statement to create a DLF catalog:
CREATE CATALOG dlf WITH (
'type' = 'dlf',
'access.key.id' = '<YourAliyunAccessKeyId>',
'access.key.secret' = '<YourAliyunAccessKeySecret>',
'warehouse' = '<YourAliyunOSSLocation>',
'oss.endpoint' = '<YourAliyunOSSEndpoint>',
'dlf.region-id' = '<YourAliyunDLFRegionId>',
'dlf.endpoint' = '<YourAliyunDLFEndpoint>'
);
Parameter |
Description |
Required |
Remarks |
catalogname |
The name of the DLF catalog. |
Yes |
Set the value to a custom name. |
type |
The type of the catalog. |
Yes |
Set the value to dlf. |
access.key.id |
The AccessKey ID of your Alibaba Cloud account. |
Yes |
For more information about how to obtain the AccessKey ID, see Obtain an AccessKey pair.
|
access.key.secret |
The AccessKey secret of your Alibaba Cloud account. |
Yes |
For more information about how to obtain the AccessKey secret, see Obtain an AccessKey pair.
|
warehouse |
The default OSS path for tables in the DLF catalog. |
Yes |
The path must be in the oss://<bucket>/<object> format. Parameters in the path:
- bucket: indicates the name of the OSS bucket that you created.
- object: indicates the path in which your data is stored.
Note Log on to the OSS console to view your bucket name and object name.
|
oss.endpoint |
The endpoint of OSS. |
Yes |
For more information, see Regions and endpoints.
|
dlf.endpoint |
The endpoint of the DLF service. |
Yes |
|
dlf.region-id |
The ID of the region in which the DLF service resides. |
Yes |
Note Make sure that the region you selected matches the endpoint you selected for dlf.endpoint.
|
- Click Execute.
After the statement is executed, the message "Query has been executed" appears.
- On the left side of the Draft Editor page, click the Schemas tab.
- Click the
icon to refresh the page and view the DLF catalog that you created.
View the metadata of the DLF catalog
After you configure the DLF catalog, you can perform the following steps to view the
DLF metadata.
- Log on to the Realtime Compute for Apache Flink console.
- On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
- In the left-side navigation pane, click Draft Editor.
- On the left side of the Draft Editor page, click the Schemas tab.
- Select the DLF catalog whose metadata you want to view from the drop-down list in
the menu bar. In this example, the DLF catalog named dlf is used.
- View information about databases, tables, and functions in the DLF catalog.
Use the DLF catalog
- Create a database and tables.
- On the Draft Editor page, create a streaming SQL job and write the following code:
CREATE DATABASE dlf.dlf_testdb;
CREATE TABLE dlf.dlf_testdb.iceberg (
id BIGINT,
data STRING,
dt STRING
) PARTITIONED BY (dt) WITH(
'connector' = 'iceberg'
);
CREATE TABLE dlf.dlf_testdb.hudi (
id BIGINT PRIMARY KEY NOT ENFORCED,
data STRING,
dt STRING
) PARTITIONED BY (dt) WITH(
'connector' = 'hudi'
);
- Select three DDL statements in sequence and click Execute to create the dlf_testdb database, the dlf_testdb.iceberg table, and the dlf_testdb.hudi
table.
Note
- You can create databases and tables on a session cluster of fully managed Flink that
you create.
- After you create the database and tables, you can view the database and tables on
the Schemas tab of the Draft Editor page.
- Insert data into the tables.
- On the Draft Editor page, create a streaming SQL job and write the following code:
INSERT INTO dlf.dlf_testdb.iceberg VALUES (1, 'AAA', '2022-02-01'), (2, 'BBB', '2022-02-01');
INSERT INTO dlf.dlf_testdb.hudi VALUES (1, 'AAA', '2022-02-01'), (2, 'BBB', '2022-02-01');
- Click Validate to verify the SQL syntax.
- Click Publish.
- Read data from the tables.
- On the Draft Editor page, create a streaming SQL job and write the following code:
SELECT * FROM dlf.dlf_testdb.iceberg LIMIT 2;
SELECT * FROM dlf.dlf_testdb.hudi LIMIT 2;
- Click Validate to verify the SQL syntax.
- Click Execute. After the SQL statement is executed, you can view the returned data records in the
console of fully managed Flink.
Delete a DLF catalog
Note After you delete a DLF catalog, the jobs that are running are not affected. However,
the jobs that use a table of the catalog can no longer find the table if the jobs
are published or restarted. Proceed with caution when you delete a DLF catalog.
You can delete a DLF catalog on the UI or by executing an SQL statement. We recommend
that you delete a DLF catalog on the UI.
Delete a DLF catalog on the UI
- Log on to the Realtime Compute for Apache Flink console.
- On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
- In the left-side navigation pane, click Draft Editor.
- On the left side of the Draft Editor page, click the Schemas tab.
- Select the DLF catalog that you want to delete from the drop-down list in the menu
bar, and click the
icon.
- In the dialog box that appears, click Delete.
- Click the
icon to refresh the page and check whether the DLF catalog is deleted.
Delete a DLF catalog by executing an SQL statement
- Log on to the Realtime Compute for Apache Flink console.
- On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
- In the left-side navigation pane, click Draft Editor.
- In the upper-left corner of the Draft Editor page, click New. In the New Draft dialog box, select STREAM / SQL from the Type drop-down list.
- In the script editor, enter the following statement:
DROP CATALOG ${catalog_name}
catalog_name is the name of the DLF catalog that you want to delete in the console
of fully managed Flink.
- Click Execute.
- On the left side of the Draft Editor page, click the Schemas tab.
- Click the
icon to refresh the page and check whether the DLF catalog is deleted.