All Products
Search
Document Center

Data Lake Formation:Ingest data to DLF with Flink DataStream APIs

Last Updated:Nov 25, 2025

This topic describes how to develop a Realtime Compute for Apache Flink job that uses the DataStream API to write data to a Data Lake Formation (DLF) catalog via Paimon REST.

Prerequisites

  • You have created a Realtime Compute for Apache Flink workspace. See Create a workspace.

  • Your Flink workspace and DLF catalog reside in the same region.

  • The VPC of your Flink workspace is in DLF's VPC whitelist. For more information, see Configure a VPC whitelist .

  • You have downloaded the paimon bundled JAR paimon-flink-*.jar of version 1.1 or later from the Apache Paimon website.

  • You have downloaded paimon-oss-*.jar of version 1.1 or later from Apache Paimon Filesystems.

Create a DLF catalog

See Get started with DLF.

Develop a DataStream program

After the development finishes, package the program and dependencies into a JAR and upload it to Realtime Compute for Apache Flink for execution. For more information, see Develop a JAR job.

  1. Include the Paimon JAR files paimon-oss-*.jar and paimon-flink-*.jar as dependencies in your Flink project. Do this in one of the two ways:

    • Upload to console: Upload the dependency files as additional dependencies in the Realtime Compute for Apache Flink console when you deploy your job.

    • Use Maven: Include the dependencies directly in your project's pom.xml file:

      <dependencies>
          <dependency>
              <groupId>org.apache.paimon</groupId>
              <artifactId>paimon-flink-${flink.main.version}</artifactId>
              <version>${paimon.version}</version>
          </dependency>
      
          <dependency>
              <groupId>org.apache.paimon</groupId>
              <artifactId>paimon-oss</artifactId>
              <version>${paimon.version}</version>
          </dependency>
      </dependencies>

      Parameters:

      • ${paimon.version}: The Paimon version (1.1 or later).

      • ${flink.main.version}: The Flink major version based on your Ververica Runtime (VVR) version.

        Run jobs on VVR 8.x

        <properties>
          <paimon.version><!-- Specify a version 1.1 or later. --></paimon.version>
          <flink.main.version>1.17</flink.main.version>
        </properties>

        Run jobs on VVR 11.x

        <properties>
          <paimon.version><!-- Specify the Paimon version you're using (1.1 or later)--></paimon.version>
          <flink.main.version>1.20</flink.main.version>
        </properties>
  2. Register your DLF catalog in Flink.

    Options options = new Options();
    options.set("type", "paimon");
    options.set("metastore", "rest");
    options.set("uri", "dlf_uri");
    options.set("warehouse", "your_catalog");
    options.set("token.provider", "dlf");
    options.set("dlf.access-key-id", "***");
    options.set("dlf.access-key-secret", "***");
    Catalog catalog = FlinkCatalogFactory.createPaimonCatalog(options);

    Parameter descriptions:

    Parameter

    Description

    Required?

    Example

    type

    The catalog type, automatically parsed from the custom JAR. Do not change this value.

    Yes

    paimon-1-ali-11.0

    metastore

    The metastore type for DLF. Set this to rest.

    Yes

    rest

    uri

    The REST endpoint for the DLF catalog service. The format is http://[region-id]-vpc.dlf.aliyuncs.com. For information about region IDs, see Regions and endpoints.

    Yes

    http://ap-southeast-1-vpc.dlf.aliyuncs.com

    warehouse

    The Paimon catalog name.

    Yes

    dlf_test

    token.provider

    The token provider. Set this to dlf.

    Yes

    dlf

    dlf.access-key-id

    Your AccessKey ID for authentication. For more information, see View the information about AccessKey pairs of a RAM user.

    Yes

    dlf.access-key-secret

    Your AccessKey secret for authentication.

    Yes

    Important

    Because DLF supports only VPC access, you cannot test your DataStream programs in a local machine. Test the programs in a cluster in the same VPC as DLF.