This topic describes how to access a Data Lake Formation (DLF) catalog from a Flink DataStream job using Paimon REST in Realtime Compute for Apache Flink.
Prerequisites
You have created a fully managed Flink workspace. For more information, see Activate Realtime Compute for Apache Flink.
The Flink workspace and DLF are in the same VPC.
The
paimon-flink-*.jarfile of version 1.1 or later has been downloaded from the Apache Paimon website.The
paimon-oss-*.jarfile of version 1.1 or later has been downloaded from Apache Paimon Filesystems.
Create a DLF Catalog
For more information, see DLF Quick Start.
Connect to the catalog from a DataStream job
For more information about how to develop and debug a Flink JAR job, see Develop a JAR job.
In a DataStream job, import the
paimon-oss-*.jarandpaimon-flink-*.jarfiles in one of the following two ways:Upload as additional files: When you submit the Flink job, upload the two JAR files as additional dependencies.
Import using Maven: Add the dependencies for
paimon-oss-*.jarandpaimon-flink-*.jarto your project's pom.xml file.<dependencies> <dependency> <groupId>org.apache.paimon</groupId> <artifactId>paimon-flink-${flink.main.version}</artifactId> <version>${paimon.version}</version> </dependency> <dependency> <groupId>org.apache.paimon</groupId> <artifactId>paimon-oss</artifactId> <version>${paimon.version}</version> </dependency> </dependencies>Parameters:
${paimon.version}: The version number of Paimon. Specify version1.1or later.${flink.main.version}: The major version of Flink. Use the version that corresponds to the Ververica Runtime (VVR) version of your job.Submit a job to VVR 8.x
<properties> <paimon.version><!-- Specify a version of 1.1 or later here. --></paimon.version> <flink.main.version>1.17</flink.main.version> </properties>Submit a job to VVR 11.x
<properties> <paimon.version><!-- Specify a version of 1.1 or later here. --></paimon.version> <flink.main.version>1.20</flink.main.version> </properties>
To create the catalog, use the following configuration.
Options options = new Options(); options.set("type", "paimon"); options.set("metastore", "rest"); options.set("uri", "dlf_uri"); options.set("warehouse", "your_catalog"); options.set("token.provider", "dlf"); options.set("dlf.access-key-id", "***"); options.set("dlf.access-key-secret", "***"); Catalog catalog = FlinkCatalogFactory.createPaimonCatalog(options);The following table describes the parameters.
Parameter
Description
Required
Example
type
The catalog type. It is automatically parsed from the custom JAR file. Do not change this value.
Yes
paimon-1-ali-11.0
metastore
The metastore type. Set this parameter to rest.
Yes
rest
uri
The URI used to access the DLF REST Catalog Server. The format is
http://[region-id]-vpc.dlf.aliyuncs.com. For more information about region IDs, see Endpoints.Yes
http://cn-hangzhou-vpc.dlf.aliyuncs.com
warehouse
The name of the DLF catalog.
Yes
dlf_test
token.provider
The token provider. Set this parameter to dlf.
Yes
dlf
dlf.access-key-id
The AccessKey ID of your Alibaba Cloud account or Resource Access Management (RAM) user. For more information, see View the AccessKey pair information of a RAM user.
Yes
-
dlf.access-key-secret
The AccessKey secret of your Alibaba Cloud account or RAM user.
Yes
-
ImportantBecause DLF supports access only from a VPC, you cannot run DataStream jobs on a local machine for testing. Test the jobs in a cluster that is in the same VPC as DLF.