Migrate Paimon FileSystem Catalog to DLF with Flink - OpenLake

This topic describes how to deploy a JAR job in Realtime Compute for Apache Flink to migrate a Paimon FileSystem catalog to DLF.

Prerequisites

A fully managed Flink workspace is created. For more information, see Activate Realtime Compute for Apache Flink.
A DLF data catalog is created. For more information, see Create a data catalog.

Procedure

Step 1: Create a JAR job

Log on to the Realtime Compute for Apache Flink management console.
In the list of fully managed Flink workspaces, click the name of your workspace.
In the navigation pane on the left, choose Operation Center > Deployments.

Click Deploy Job, select JAR Job, and configure the following parameters.

Parameter	Description	Example
Deployment Mode	This parameter is fixed to Batch Mode.	Batch Mode
Deployment Name	Enter a name for the JAR job.	migrate_paimon
Engine Version	Select a real-time computing engine version.	vvr-8.0.11-flink-1.17
JAR URI	Upload the paimon-flink-action JAR package.	Upload the paimon-flink-action-1.3-SNAPSHOT-for-clone-20250909.jar package. If you have uploaded it before, select it from the drop-down list.
Entry Point Class	The entry point class of the program.	Leave this blank.
Entry Point Main Arguments	The parameters passed to the main method.	Leave this blank for now. The specific parameters depend on the job. For more information, see Step 2.
Additional Dependencies	Specify the path or filename of the dependency file to attach.	Upload the paimon-ali-vvr-8.0-vvp-1.3-ali-SNAPSHOT-for-clone-20250909.jar package. If you have uploaded it before, select it from the drop-down list.

Note

For more information about deployment parameters, see Deploy a JAR job.

Click Deploy to create the JAR job.

Step 2: Adjust parameters and start the job

A Flink job can migrate an entire catalog, an entire database, or a single table. Adjust the Entry Point Main Arguments parameter based on your migration goal.

On the Job O&M page, find the JAR job that you created and click Details.

On the Deployment Details page, click Edit in the upper-right corner and specify the Entry Point Main Arguments parameter.

The method varies based on your source table type:

clone
--parallelism '<parallelism>'
--database '<database-name>'
--table '<table-name>'
--catalog_conf 'metastore=filesystem'
--catalog_conf "warehouse=<warehouse>"
--catalog_conf 'fs.oss.endpoint=<fs.oss.endpoint>'
--catalog_conf 'fs.oss.accessKeyId=<fs.oss.accessKeyId>'
--catalog_conf 'fs.oss.accessKeySecret=<fs.oss.accessKeySecret>'
--target_database '<target-database-name>'
--target_table '<target-table-name>'
--target_catalog_conf 'metastore=rest'
--target_catalog_conf 'warehouse=<target-warehouse>'
--target_catalog_conf 'uri=<dlf.next.endpoint>'
--target_catalog_conf 'token.provider=dlf'
--target_catalog_conf 'dlf.access-key-id=<dlf.access-key-id>'
--target_catalog_conf 'dlf.access-key-secret=<dlf.access-key-secret>'
--clone_from 'paimon'
--where '<filter-spec>'

The following table describes the configuration items.

Configuration Item	Description	Required	Remarks
parallelism	The concurrency of the job.	No	Example: 16
database-name	The name of the FileSystem catalog database to clone.	No	Example: my_database
table-name	The name of the FileSystem Catalog data table to clone.	No	Example: my_table
warehouse	The path of the OSS repository for the FileSystem catalog to clone.	Yes	The format is `oss://<bucket>/<object>`. In the format: bucket: the name of your OSS bucket. object: the path where your data is stored. View your bucket and object names in the OSS console.
fs.oss.endpoint	The endpoint of the OSS service.	Yes	For more information about how to obtain the endpoint, see Regions and endpoints. OSS example: oss-cn-hangzhou-internal.aliyuncs.com. OSS-HDFS example: cn-hangzhou.oss-dls.aliyuncs.com
fs.oss.accessKeyId	The AccessKey ID of the Alibaba Cloud account or RAM user that has read and write permissions on OSS.	Yes	Use an existing AccessKey or create a new one. For more information, see Create an AccessKey. Note To reduce the risk of an AccessKey secret leak, the AccessKey Secret is displayed only when you create it and cannot be retrieved later. Store your AccessKey secret securely.
fs.oss.accessKeySecret	The AccessKey secret of the Alibaba Cloud account or RAM user that has read and write permissions on OSS.	Yes
target-database-name	The name of the cloned DLF database.	No	Example: target_database
target-table-name	The name of the cloned DLF data table.	No	Example: targety_table
target-warehouse	The name of the cloned DLF data catalog.	Yes	View the data catalog name in the DLF console. For more information, see Data catalogs.
dlf.next.endpoint	The endpoint of the DLF service.	Yes	For more information, see Endpoints. Example: cn-hangzhou-vpc.dlf.aliyuncs.com
dlf.access-key-id	The AccessKey ID required to access the DLF service.	Yes	Use an existing AccessKey or create a new one. For more information, see Create an AccessKey. Note To reduce the risk of an AccessKey secret leak, the AccessKey Secret is displayed only when you create it and cannot be retrieved later. Store your AccessKey secret securely.
dlf.access-key-secret	The AccessKey secret required to access the DLF service.	Yes
clone_from	The type of the source table to clone.	Yes	'paimon'
filter-spec	The filter condition for partitions during cloning.	No	dt = '2024-10-01'

Important

If you want to migrate an entire database, do not set the table-name and target-table-name parameters.
If you want to migrate an entire data catalog, do not set the database-name and target-database-name parameters.
When you migrate an entire data catalog or database, you can exclude specific tables by setting the --excluded_tables <excluded-tables-spec> parameter. Example: my_db.my_tbl,my_db2.my_tbl2. Do not set this parameter for single-table migrations.

After you configure the parameters, click Save on the Deployment Details page.
On the Job O&M page, click Start next to the JAR job. Then, start the job with the default parameters.

Step 3: Verify the result

When the job status changes to Finished, log on to the DLF console and verify that the migration was successful.

For a full catalog migration: Check that the catalog structure, databases, and tables in DLF are consistent with those in the FileSystem catalog.
For a full database migration: Check that the database and table structures in DLF are consistent with those in the FileSystem catalog.
For a single table migration: Check that the table structure in DLF is consistent with that in the FileSystem catalog.