This topic describes how to develop a Realtime Compute for Apache Flink job that uses the DataStream API to write data to a Data Lake Formation (DLF) catalog via Paimon REST.
Prerequisites
You have created a Realtime Compute for Apache Flink workspace. See Create a workspace.
Your Flink workspace and DLF catalog reside in the same region.
The VPC of your Flink workspace is in DLF's VPC whitelist. For more information, see Configure a VPC whitelist .
You have downloaded the paimon bundled JAR
paimon-flink-*.jarof version 1.1 or later from the Apache Paimon website.You have downloaded
paimon-oss-*.jarof version 1.1 or later from Apache Paimon Filesystems.
Create a DLF catalog
See Get started with DLF.
Develop a DataStream program
After the development finishes, package the program and dependencies into a JAR and upload it to Realtime Compute for Apache Flink for execution. For more information, see Develop a JAR job.
Include the Paimon JAR files
paimon-oss-*.jarandpaimon-flink-*.jaras dependencies in your Flink project. Do this in one of the two ways:Upload to console: Upload the dependency files as additional dependencies in the Realtime Compute for Apache Flink console when you deploy your job.
Use Maven: Include the dependencies directly in your project's
pom.xmlfile:<dependencies> <dependency> <groupId>org.apache.paimon</groupId> <artifactId>paimon-flink-${flink.main.version}</artifactId> <version>${paimon.version}</version> </dependency> <dependency> <groupId>org.apache.paimon</groupId> <artifactId>paimon-oss</artifactId> <version>${paimon.version}</version> </dependency> </dependencies>Parameters:
${paimon.version}: The Paimon version (1.1or later).${flink.main.version}: The Flink major version based on your Ververica Runtime (VVR) version.Run jobs on VVR 8.x
<properties> <paimon.version><!-- Specify a version 1.1 or later. --></paimon.version> <flink.main.version>1.17</flink.main.version> </properties>Run jobs on VVR 11.x
<properties> <paimon.version><!-- Specify the Paimon version you're using (1.1 or later)--></paimon.version> <flink.main.version>1.20</flink.main.version> </properties>
Register your DLF catalog in Flink.
Options options = new Options(); options.set("type", "paimon"); options.set("metastore", "rest"); options.set("uri", "dlf_uri"); options.set("warehouse", "your_catalog"); options.set("token.provider", "dlf"); options.set("dlf.access-key-id", "***"); options.set("dlf.access-key-secret", "***"); Catalog catalog = FlinkCatalogFactory.createPaimonCatalog(options);Parameter descriptions:
Parameter
Description
Required?
Example
typeThe catalog type, automatically parsed from the custom JAR. Do not change this value.
Yes
paimon-1-ali-11.0metastoreThe metastore type for DLF. Set this to
rest.Yes
resturiThe REST endpoint for the DLF catalog service. The format is
http://[region-id]-vpc.dlf.aliyuncs.com. For information about region IDs, see Regions and endpoints.Yes
http://ap-southeast-1-vpc.dlf.aliyuncs.comwarehouseThe Paimon catalog name.
Yes
dlf_testtoken.providerThe token provider. Set this to
dlf.Yes
dlfdlf.access-key-idYour AccessKey ID for authentication. For more information, see View the information about AccessKey pairs of a RAM user.
Yes
dlf.access-key-secretYour AccessKey secret for authentication.
Yes
ImportantBecause DLF supports only VPC access, you cannot test your DataStream programs in a local machine. Test the programs in a cluster in the same VPC as DLF.