Use the Flink DataStream API to write data to Apache Paimon tables managed by a Data Lake Formation (DLF) catalog over Paimon REST.
Prerequisites
Before you begin, ensure that you have:
-
A Realtime Compute for Apache Flink workspace (see Create a workspace)
-
A DLF catalog in the same region as your Flink workspace (see Get started with DLF)
-
The VPC of your Flink workspace added to the DLF VPC whitelist (see Configure a VPC whitelist)
-
The Paimon bundled JAR
paimon-flink-*.jar, version 1.1 or later (download from Apache Paimon) -
The OSS filesystem JAR
paimon-oss-*.jar, version 1.1 or later (download from Apache Paimon Filesystems)
DLF supports VPC access only. Test your DataStream programs in a cluster that is in the same VPC as DLF, not on a local machine.
How it works
-
Add
paimon-flink-*.jarandpaimon-oss-*.jaras dependencies in your Flink project. -
Register a DLF catalog in your program using
FlinkCatalogFactory. -
Package the program as a JAR and deploy it in Realtime Compute for Apache Flink.
Step 1: Add dependencies
Include paimon-oss-*.jar and paimon-flink-*.jar in your project using one of the following methods:
Upload to the console
Upload the JARs as additional dependencies in the Realtime Compute for Apache Flink console when you deploy your job. No changes to pom.xml are required.
Add Maven dependencies
Add the following to your project's pom.xml:
<properties>
<!-- Specify 1.1 or later. See the version table below. -->
<paimon.version>YOUR_PAIMON_VERSION</paimon.version>
<flink.main.version>YOUR_FLINK_VERSION</flink.main.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.paimon</groupId>
<artifactId>paimon-flink-${flink.main.version}</artifactId>
<version>${paimon.version}</version>
</dependency>
<dependency>
<groupId>org.apache.paimon</groupId>
<artifactId>paimon-oss</artifactId>
<version>${paimon.version}</version>
</dependency>
</dependencies>
Use the following values for ${flink.main.version} based on your Ververica Runtime (VVR) version:
| VVR version | ${flink.main.version} |
|---|---|
| VVR 8.x | 1.17 |
| VVR 11.x | 1.20 |
Step 2: Register the DLF catalog
Use FlinkCatalogFactory.createPaimonCatalog to register the DLF catalog in your program. The uri parameter points to the DLF VPC endpoint for your region. For endpoint values, see Regions and endpoints.
Options options = new Options();
options.set("type", "paimon");
options.set("metastore", "rest");
options.set("uri", "http://<region-id>-vpc.dlf.aliyuncs.com"); // e.g., http://ap-southeast-1-vpc.dlf.aliyuncs.com
options.set("warehouse", "<your-catalog-name>"); // The Paimon catalog name in DLF
options.set("token.provider", "dlf");
options.set("dlf.access-key-id", "<your-access-key-id>");
options.set("dlf.access-key-secret", "<your-access-key-secret>");
Catalog catalog = FlinkCatalogFactory.createPaimonCatalog(options);
For AccessKey credentials, see View the information about AccessKey pairs of a RAM user.
Catalog parameters
| Parameter | Description | Required | Example |
|---|---|---|---|
type |
Catalog type, automatically parsed from the custom JAR. Do not change this value. | Yes | paimon-1-ali-11.0 |
metastore |
Metastore type for DLF. Set to rest. |
Yes | rest |
uri |
DLF VPC endpoint. Format: http://[region-id]-vpc.dlf.aliyuncs.com |
Yes | http://ap-southeast-1-vpc.dlf.aliyuncs.com |
warehouse |
The Paimon catalog name in DLF. | Yes | dlf_test |
token.provider |
Token provider. Set to dlf. |
Yes | dlf |
dlf.access-key-id |
AccessKey ID for authentication. See View the information about AccessKey pairs of a RAM user. | Yes | — |
dlf.access-key-secret |
AccessKey secret for authentication. | Yes | — |
Step 3: Build and deploy
Package the program and all dependencies into a JAR, then upload and run it in Realtime Compute for Apache Flink. See Develop a JAR job.
What's next
-
Configure a VPC whitelist — add your Flink workspace VPC to the DLF allowlist
-
Regions and endpoints — find the correct DLF endpoint for your region
-
Develop a JAR job — package and deploy your Flink program