After you configure an Apache Paimon catalog, you can directly access the Apache Paimon tables in the catalog stored in Data Lake Formation (DLF) from Realtime Compute for Apache Flink. This topic describes how to create, view, and delete an Apache Paimon catalog and manage Apache Paimon databases and tables in the development console of Realtime Compute for Apache Flink.
Usage notes
Only Ververica Runtime (VVR) 8.0.5 or later allows you to create and configure Apache Paimon catalogs and tables.
OSS is used to store files related to Apache Paimon tables. The files include data files and metadata files. Make sure that you have activated OSS and that the storage class of the associated OSS bucket is Standard. For more information, see Get started with the OSS console and Overview of storage classes.
ImportantWhile you can use the OSS bucket specified when you create your Realtime Compute for Apache Flink workspace, we recommend that you create and use a separate OSS bucket in the same region. This improves data isolation and minimizes the risk of misoperations.
The AccessKey pair used to register the Paimon catalog must belong to an account that has the read and write access to your OSS bucket and and DLF catalog.
After you create or delete a catalog, database, or table by using SQL statements, you can click the
icon to refresh the Catalogs page. The following table lists the compatibility between different versions of Apache Paimon and VVR.
Apache Paimon version
VVR version
1.1
11
1.0
8.0.11
0.9
8.0.7, 8.0.8, 8.0.9, and 8.0.10
0.8
8.0.6
0.7
8.0.5
0.6
8.0.4
0.6
8.0.3
Create an Apache Paimon catalog
Create an Apache Paimon Filesystem catalog
Console
Go to the Catalogs page.
Log on to the Realtime Compute for Apache Flink console. Find the workspace that you want to manage and click Console in the Actions column.
In the left-side navigation pane, click Catalogs.
Click Create Catalog. On the Built-in Catalog tab of the wizard that appears, choose Apache Paimon and click Next.
In the Configure Catalog step, configure the parameters.
SQL commands
Execute the following SQL statement in the SQL editor. For detailed instructions, see Scripts.
CREATE CATALOG `my-catalog` WITH (
'type' = 'paimon',
'metastore' = 'filesystem',
'warehouse' = '<warehouse>',
'fs.oss.endpoint' = '<fs.oss.endpoint>',
'fs.oss.accessKeyId' = '<fs.oss.accessKeyId>',
'fs.oss.accessKeySecret' = '<fs.oss.accessKeySecret>'
);The following table describes the parameters in the SQL statements.
General
Parameter
Description
Required
Remarks
my-catalog
The name of the Apache Paimon catalog.
Yes
Enter a custom name.
type
The type of the catalog.
Yes
Set the value to paimon.
metastore
The metadata storage type.
Yes
Valid values:
filesystem: Creates an Apache Paimon Filesystem catalog.
dlf: Creates an Apache Paimon DLF catalog.
maxcompute: Creates an Apache Paimon MaxCompute catalog.
sync: Creates an Apache Paimon Sync catalog.
OSS
Parameter
Description
Required
Remarks
warehouse
The data warehouse directory that is specified in OSS.
Yes
Format: oss://<bucket>/<object>. Field description:
bucket: the name of the OSS bucket that you created.
object: the path in which your data is stored.
You can view the values of the bucket and object fields in the OSS console.
fs.oss.endpoint
The endpoint of OSS.
Yes
If DLF resides in the same region as your Realtime Compute for Apache Flink workspace, use the VPC endpoint. If they are not in the same region, use the public endpoint.
These parameters are required if the OSS bucket specified by the warehouse parameter does not reside in the same region as the Realtime Compute for Apache Flink workspace or if an OSS bucket within another Alibaba Cloud account is used.
For more information about how to obtain the required information, see Regions, endpoints, and open ports and Create an AccessKey pair.
fs.oss.accessKeyId
The AccessKey ID of the Alibaba Cloud account or RAM user that has the read and write permissions on OSS.
Yes
fs.oss.accessKeySecret
The AccessKey secret of the Alibaba Cloud account or RAM user that has the read and write permissions on OSS.
Yes
Create an Apache Paimon DLF catalog
DLF
Create a Paimon catalog in the DLF console. See Get started with DLF.
Register the Paimon catalog in Flink's Development Console.
NoteThis operation creates a mapping to your DLF catalog. Creating or deleting the catalog in Flink does not affect actual data in DLF.
Log on to the Realtime Compute for Apache Flink management console.
Click your workspace name to open the Development Console.
Register your catalog using one of the following methods:
UI
In the left navigation menu, click Catalogs.
On the Catalog List page, click Create Catalog.
In the Create Catalog wizard, select Apache Paimon, and then click Next.
Set metastore to DLF. For catalog name, select the DLF catalog to connect.
Click Confirm.
SQL commands
In the Scripts SQL editor, copy and run the following SQL code to register a DLF catalog in Flink.
CREATE CATALOG `flink_catalog_name` WITH ( 'type' = 'paimon', 'metastore' = 'rest', 'token.provider' = 'dlf', 'uri' = 'http://cn-hangzhou-vpc.dlf.aliyuncs.com', 'warehouse' = 'dlf_test' );The following table describes the connector options:
Option
Description
Required
Example
typeThe catalog type. Set this option to
paimon.Yes
paimonmetastoreThe catalog metastore. Set this option to
rest.Yes
resttoken.providerThe token provider. Set this option to
dlf.Yes
dlfuriThe Rest URI for the DLF catalog service. Format:
http://[region-id]-vpc.dlf.aliyuncs.com. See Region ID in Regions and endpoints.Yes
http://ap-southeast-1-vpc.dlf.aliyuncs.com
warehouseThe name of the DLF paimon catalog.
Yes
dlf_test
DLF-Legacy
- Note
The DLF-Legacy catalog must reside in the same region as your Realtime Compute for Apache Flink workspace.
Register the catalog in Realtime Compute for Apache Flink's Develpment Console.
UI
Go to the Catalogs page.
Log on to Realtime Compute for Apache Flink's Management Console.
Click Console in the Actions column of your workspace.
The Development Console appears.
In the left navigation pane, click Catalogs.
Click Create Catalog.
In the Create Catalog wizard, on the Built-in Catalog tab, click Apache Paimon and click Next.
For metastore, select dlf.
For catalog name, select your DLF-Legacy catalog.
SQL commands
Log on to Realtime Compute for Apache Flink's Management Console.
Click Console in the Actions column of your workspace.
The Development Console appears.
In the left navigation pane, choose .
Execute the following SQL statement in the SQL editor. For detailed instructions, see Scripts.
CREATE CATALOG `my-catalog` WITH ( 'type' = 'paimon', 'metastore' = 'dlf', 'warehouse' = '<warehouse>', 'dlf.catalog.id' = '<dlf.catalog.id>', 'dlf.catalog.accessKeyId' = '<dlf.catalog.accessKeyId>', 'dlf.catalog.accessKeySecret' = '<dlf.catalog.accessKeySecret>', 'dlf.catalog.endpoint' = '<dlf.catalog.endpoint>', 'dlf.catalog.region' = '<dlf.catalog.region>', 'fs.oss.endpoint' = '<fs.oss.endpoint>', 'fs.oss.accessKeyId' = '<fs.oss.accessKeyId>', 'fs.oss.accessKeySecret' = '<fs.oss.accessKeySecret>' );Replace the placeholder values with your actual ones:
General
Parameter
Description
Required
Remarks
my-catalog
The name of the Apache Paimon catalog.
Yes
Enter a custom name.
type
The type of the catalog.
Yes
Set the value to paimon.
metastore
The metadata storage type.
Yes
Set the value to dlf.
OSS
Parameter
Description
Required
Remarks
warehouse
The data warehouse directory that is specified in OSS.
Yes
Format: oss://<bucket>/<object>. Field description:
bucket: the name of the OSS bucket that you created.
object: the path in which your data is stored.
You can view the values of the bucket and object fields in the OSS console.
fs.oss.endpoint
The endpoint of OSS.
Yes
If DLF resides in the same region as your Realtime Compute for Apache Flink workspace, use the VPC endpoint. If they are not in the same region, use the public endpoint.
If you want to store Apache Paimon tables in OSS-Apsara File Storage for HDFS (HDFS), set the value of the fs.oss.endpoint parameter in this format:
cn-<region>.oss-dls.aliyuncs.comformat. Example:cn-hangzhou.oss-dls.aliyuncs.com.
fs.oss.accessKeyId
The AccessKey ID of the Alibaba Cloud account or RAM user that has the read and write permissions on OSS.
Yes
For more information about how to obtain the required information, see Regions, endpoints, and open ports and Create an AccessKey pair.
fs.oss.accessKeySecret
The AccessKey secret of the Alibaba Cloud account or RAM user that has the read and write permissions on OSS.
Yes
DLF
Parameter
Description
Required
Remarks
dlf.catalog.id
The ID of the DLF data directory.
Yes
You can view the ID of the data directory in the DLF console.
dlf.catalog.accessKeyId
The AccessKey ID that is used to access the DLF service.
Yes
For more information about how to obtain an AccessKey ID, see Create an AccessKey pair.
dlf.catalog.accessKeySecret
The AccessKey secret that is used to access the DLF service.
Yes
For more information about how to obtain an AccessKey secret, see Create an AccessKey pair.
dlf.catalog.endpoint
The endpoint of DLF.
Yes
For more information, see Supported regions and endpoints.
NoteIf DLF resides in the same region as your Realtime Compute for Apache Flink workspace, use the VPC endpoint. If they are not in the same region, use the public endpoint.
dlf.catalog.region
The region in which DLF is deployed.
Yes
For more information, see Supported regions and endpoints.
NoteMake sure that the value of this parameter matches the endpoint specified by the dlf.catalog.endpoint parameter.
Manage an Apache Paimon database
You can manage an Apache Paimon database by executing the following commands on the SQL Editor page. For more information, see Scripts.
Create a database
After you create an Apache Paimon catalog, a database named
defaultis automatically created in the catalog.-- Replace my-catalog with the name of the actual Apache Paimon catalog. USE CATALOG `my-catalog`; -- Replace my_db with a custom database name. CREATE DATABASE `my_db`;Drop a database
ImportantYou cannot drop the default database from a Paimon catalog in DLF. You can only drop the default database from a Paimon catalog of the Filesystem type.
-- Replace my-catalog with the name of the actual Apache Paimon catalog. USE CATALOG `my-catalog`; -- Replace my_db with the name of the database that you want to drop. DROP DATABASE 'my_db'; -- Drop an empty database. DROP DATABASE `my_db` CASCADE; -- Drop the database and all the associated tables.
Manage Apache Paimon tables
Create an Apache Paimon table
Modify the schema of an Apache Paimon table
Drop an Apache Paimon table
View or drop an Apache Paimon catalog
In the Realtime Compute for Apache Flink console, find the workspace that you want to manage and click Console in the Actions column.
In the left-side navigation pane, click Catalogs. A list of Apache Paimon catalogs displays.
View an Apache Paimon catalog: In the Catalog List section, find the catalog that you want to manage, and you can view its Name and Type. To see the databases and tables within the catalog, click View in the Actions column.
Drop an Apache Paimon catalog: In the Catalog List section, find the catalog that you want to manage and click Delete in the Actions column.
NoteAfter the Apache Paimon catalog is deleted, only the catalog information on the Catalogs page in the Flink namespace is deleted. The data files of the Apache Paimon tables remain. After the Apache Paimon catalog is deleted, you can re-create the Apache Paimon catalog by executing an SQL statement. Then, you can use the Apache Paimon tables in the catalog again.
You can also drop an Apache Paimon catalog by executing the
DROP CATALOG <catalog name>;command on the SQL Editor page. For more information, see Scripts.
References
After you register an Apache Paimon table in Flink, you can read data from or write data to Paimon. For more information, see Write data to and consume data from a Paimon table.
If the built-in catalogs of Realtime Compute for Apache Flink cannot meet your business requirements, you can use custom catalogs. For more information, see Manage custom catalogs.
For more information about common optimization methods for Apache Paimon primary key tables and append scalable tables in different scenarios, see Performance optimization.
