After you create a Data Lake Formation (DLF) catalog, you can access the tables of a DLF instance in the console of fully managed Flink. This topic describes how to create, view, use, and delete a DLF catalog in the console of fully managed Flink.
Background information
Alibaba Cloud DLF is a unified metadata management service that is provided by Alibaba Cloud. You can use DLF to manage tables that are in open source formats, such as Iceberg, Hudi, Delta, Parquet, ORC, and Avro.
This topic describes the operations that you can perform to manage DLF catalogs:
Prerequisites
The Alibaba Cloud DLF service is activated.
Limits
Only Realtime Compute for Apache Flink whose engine version is vvr-4.0.12-flink-1.13 or later supports DLF catalogs.
Fully managed Flink can manage only the Iceberg and Hudi data lake formats in DLF catalogs.
Create a DLF catalog
You can create a DLF catalog on the UI or by executing an SQL statement. We recommend that you create a DLF catalog on the UI.
Create a DLF catalog on the UI
Go to the Catalogs page.
Log on to the Realtime Compute for Apache Flink console.
On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
In the left-side navigation pane, click Catalogs.
On the Catalog List page, click Create Catalog.
In the Create Catalog dialog box, click DLF on the Built-in Catalog tab in the Choose Catalog Type step and click Next.
Create a DLF catalog.
Configure the catalog information.
Parameter
Description
Required
Remarks
catalogname
The name of the DLF catalog.
Yes
Set the value to a custom name.
access.key.id
The AccessKey ID of your Alibaba Cloud account that is used to access Object Storage Service (OSS).
Yes
For more information about how to obtain the AccessKey ID, see Obtain an AccessKey pair.
access.key.secret
The AccessKey secret of your Alibaba Cloud account that is used to access OSS.
Yes
For more information about how to obtain the AccessKey secret, see Obtain an AccessKey pair.
warehouse
The default path in which tables in the DLF catalog are stored in OSS. The default OSS path is the OSS directory.
Yes
The path must be in the oss://<bucket>/<object> format. Parameters in the path:
bucket: indicates the name of the OSS bucket that you created.
object: indicates the path in which your data is stored.
NoteLog on to the OSS console to view your bucket name and object name.
oss.endpoint
The endpoint of OSS, such as
oss-cn-hangzhou.aliyuncs.com
.Yes
For more information, see Regions and endpoints.
NoteWe recommend that you set oss.endpoint to a virtual private cloud (VPC) endpoint of OSS. For example, if you select the China (Hangzhou) region, set oss.endpoint to oss-cn-hangzhou-internal.aliyuncs.com.
If you want to access OSS across VPCs, follow the instructions that are described in How does fully managed Flink access a service across VPCs?
dlf.endpoint
The endpoint of the DLF service.
Yes
NoteWe recommend that you set dlf.endpoint to the VPC endpoint of DLF. For example, if you select the China (Hangzhou) region, set the dlf.endpoint parameter to dlf-vpc.cn-hangzhou.aliyuncs.com.
If you want to access DLF across VPCs, follow the instructions that are described in How does fully managed Flink access a service across VPCs?
dlf.region-id
The ID of the region in which the DLF service resides.
Yes
NoteMake sure that the region you selected matches the endpoint you selected for dlf.endpoint.
more configuration
Other parameters that you want to configure for the DLF catalog. For example, you can specify multiple DLF catalogs. Separate multiple DLF catalogs with line feeds.
No
Example:
dlf.catalog.id:my_catalog
.Click Confirm.
View the catalog that you create in the Catalogs pane on the left side of the Catalog List page.
Create a DLF catalog by executing an SQL statement
Create a blank streaming draft. For more information, see Develop an SQL draft.
In the script editor, enter a statement to create a DLF catalog.
CREATE CATALOG dlf WITH ( 'type' = 'dlf', 'access.key.id' = '<YourAliyunAccessKeyId>', 'access.key.secret' = '<YourAliyunAccessKeySecret>', 'warehouse' = '<YourAliyunOSSLocation>', 'oss.endpoint' = '<YourAliyunOSSEndpoint>', 'dlf.region-id' = '<YourAliyunDLFRegionId>', 'dlf.endpoint' = '<YourAliyunDLFEndpoint>' );
Parameter
Description
Required
Remarks
catalogname
The name of the DLF catalog.
Yes
Set the value to a custom name.
type
The type of the catalog.
Yes
Set the value to dlf.
access.key.id
The AccessKey ID of your Alibaba Cloud account.
Yes
For more information about how to obtain the AccessKey ID, see Obtain an AccessKey pair.
access.key.secret
The AccessKey secret of your Alibaba Cloud account.
Yes
For more information about how to obtain the AccessKey secret, see Obtain an AccessKey pair.
warehouse
The default path in which tables in the DLF catalog are stored in OSS.
Yes
The path must be in the oss://<bucket>/<object> format. Parameters in the path:
bucket: indicates the name of the OSS bucket that you created.
object: indicates the path in which your data is stored.
NoteLog on to the OSS console to view your bucket name and object name.
oss.endpoint
The endpoint of OSS.
Yes
For more information, see Regions and endpoints.
NoteWe recommend that you set oss.endpoint to a VPC endpoint of OSS. For example, if you select the China (Hangzhou) region, set oss.endpoint to oss-cn-hangzhou-internal.aliyuncs.com.
If you want to access OSS across VPCs, follow the instructions that are described in How does fully managed Flink access a service across VPCs?
dlf.endpoint
The endpoint of the DLF service.
Yes
NoteWe recommend that you set dlf.endpoint to the VPC endpoint of DLF. For example, if you select the China (Hangzhou) region, set the dlf.endpoint parameter to dlf-vpc.cn-hangzhou.aliyuncs.com.
If you want to access DLF across VPCs, follow the instructions that are described in How does fully managed Flink access a service across VPCs?
dlf.region-id
The ID of the region in which the DLF service resides.
Yes
NoteMake sure that the region you selected matches the endpoint you selected for dlf.endpoint.
Select the code that is used to create a catalog and click Run that appears on the left side of the code.
In the Catalogs pane on the left side of the Catalog List page, view the catalog that you create.
View a DLF catalog
After you create the DLF catalog, you can perform the following steps to view the DLF metadata.
Go to the Catalogs page.
Log on to the Realtime Compute for Apache Flink console.
On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
In the left-side navigation pane, click Catalogs.
On the Catalog List page, find the desired catalog and view the Name and Type columns of the catalog.
NoteIf you want to view the databases and tables in the catalog, click View in the Actions column.
Use a DLF catalog
Create a DLF database and tables
Create a DLF table on the UI
Go to the Catalogs page.
Log on to the Realtime Compute for Apache Flink console.
On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
In the left-side navigation pane, click Catalogs.
On the Catalog List page, find the desired catalog and click View in the Actions column.
On the page that appears, find the desired database and click View in the Actions column.
On the page that appears, click Create Table.
On the Built-in tab of the Create Table dialog box, click Connection Type and select a table type.
Click Next.
Enter the table creation statement and configure related parameters. Sample code:
CREATE DATABASE dlf.dlf_testdb; CREATE TABLE dlf.dlf_testdb.iceberg ( id BIGINT, data STRING, dt STRING ) PARTITIONED BY (dt) WITH( 'connector' = 'iceberg' ); CREATE TABLE dlf.dlf_testdb.hudi ( id BIGINT PRIMARY KEY NOT ENFORCED, data STRING, dt STRING ) PARTITIONED BY (dt) WITH( 'connector' = 'hudi' );
Click Confirm.
Create a DLF table by executing an SQL statement
Create a blank streaming draft. For more information, see Develop an SQL draft.
In the script editor, enter the table creation statement.
CREATE DATABASE dlf.dlf_testdb; CREATE TABLE dlf.dlf_testdb.iceberg ( id BIGINT, data STRING, dt STRING ) PARTITIONED BY (dt) WITH( 'connector' = 'iceberg' ); CREATE TABLE dlf.dlf_testdb.hudi ( id BIGINT PRIMARY KEY NOT ENFORCED, data STRING, dt STRING ) PARTITIONED BY (dt) WITH( 'connector' = 'hudi' );
Select the following DDL statements in sequence and click Run that appears on the left side of the code.
After the dlf database, dlf_testdb.iceberg table, and dlf_testdb.hudi table are created, click the Catalogs tab on the left side of the SQL Editor page to view the created database and tables.
You can create databases and tables on a session cluster of fully managed Flink that you create.
Write data
INSERT INTO dlf.dlf_testdb.iceberg VALUES (1, 'AAA', '2022-02-01'), (2, 'BBB', '2022-02-01');
INSERT INTO dlf.dlf_testdb.hudi VALUES (1, 'AAA', '2022-02-01'), (2, 'BBB', '2022-02-01');
Read data
SELECT * FROM dlf.dlf_testdb.iceberg LIMIT 2;
SELECT * FROM dlf.dlf_testdb.hudi LIMIT 2;
Delete a DLF catalog
After you delete a DLF catalog, the deployments that are running are not affected. However, the deployments that use a table of the catalog can no longer find the table if the deployments are published or restarted. Proceed with caution when you delete a DLF catalog.
You can delete a DLF catalog on the UI or by executing an SQL statement. We recommend that you delete a DLF catalog on the UI.
Delete a DLF catalog on the UI
Go to the Catalogs page.
Log on to the Realtime Compute for Apache Flink console.
On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
In the left-side navigation pane, click Catalogs.
On the Catalog List page, find the desired catalog and click Delete in the Actions column.
In the message that appears, click Delete.
View the Catalogs pane on the left side of the Catalog List page to check whether the catalog is deleted.
Delete a DLF catalog by executing an SQL statement
Create a blank streaming draft. For more information, see Develop an SQL draft.
In the script editor, enter the following command:
DROP CATALOG ${catalog_name}
catalog_name is the name of the DLF catalog that you want to delete in the console of fully managed Flink.
Right-click the statement that is used to delete the catalog and choose Run from the short-cut menu.
View the Catalogs pane on the left side of the Catalog List page to check whether the catalog is deleted.