Configure Paimon Catalog as a Metadata Governance Source - DataWorks

Configure a Paimon Catalog data source to let DataWorks discover and govern Paimon table metadata stored in Object Storage Service (OSS) — including self-declared catalogs that aren't managed by Data Lake Formation (DLF).

Many Flink-based workloads declare Paimon catalogs directly in the Flink engine, with metadata and data stored in OSS rather than DLF. These self-declared catalogs fall outside DataWorks' standard data source management. The Paimon Catalog data source fills this gap: it reads metadata from OSS-backed Paimon catalogs and surfaces those assets in Data Map for unified governance.

Limits

Network: Only serverless resource groups are supported.
Scope: This data source supports Collect Metadata and governance only. It does not support data integration sync tasks. To read from or write to Paimon tables in a sync task, use a DLF or OSS data source instead.

Add a Paimon Catalog data source

Prerequisites

Before you begin, ensure that you have:

An OSS bucket containing the Paimon catalog warehouse data
The full warehouse path, such as oss://bucket/path/warehouse

Step 1: Open the Data Sources page

Log on to the DataWorks console and select the target region. In the left navigation pane, click Workspace, find the target workspace, and then click Manage in the Actions column.
On the Management Center page, click Data Sources in the left navigation pane.

Step 2: Create the data source

On the Data Sources page, click Add Data Source.
In the dialog box, search for and select Paimon Catalog.

Step 3: Configure parameters

Catalog connection parameters

Parameter	Required	Description
Data Source Name	Yes	A custom identifier for this data source, such as `paimon_finance`.
Catalog	Yes	The name of the catalog as declared on the compute engine side, such as `paimon-catalog`. Set this name to be the same as the one on the compute engine side to ensure accurate metadata mapping.
MetaStore	Yes	The metastore type. Currently, only Filesystem is supported.
Filesystem	Yes	The file storage type. Currently, only OSS is supported.

Storage access parameters

Parameter	Required	Description
Access Mode	Yes	How DataWorks authenticates to OSS. See Access mode options below.
Region	Yes	The region of the OSS bucket. Select a bucket in the same region as the workspace to avoid cross-region latency. To connect to a bucket in a different region, establish a VPC peering connection or connect via a public endpoint.
Endpoint	Yes	The OSS domain name. For configuration guidance, see Overview of endpoints and network connectivity.
Warehouse	Yes	The full OSS path to the Paimon catalog warehouse, such as `oss://bucket/path/warehouse`. Click the folder icon to the right of the field to browse and select a path. An incorrect path causes metadata collection to fail.

Access mode options

RAM Role Authorization Mode — DataWorks accesses OSS using a RAM role. Use this mode to control access through role-based permissions without embedding credentials. For setup instructions, see Configure a data source in RAM role authorization mode.
AccessKey Mode — DataWorks accesses OSS using the AccessKey of the currently logged-on account as the access identity.

Step 4: Test connectivity

After saving the configuration, test connectivity to confirm the data source can reach OSS through the serverless resource group.

Connected — the configuration is correct and DataWorks can reach the catalog.
Connection failed. — a diagnostic tool appears. Common causes include:
- Incorrect AccessKey or RAM role configuration
- Missing IP address whitelist entry for the resource group
- Missing NAT Gateway for the resource group's network

In standard mode, both the development and production environments must show Connected. Otherwise, subsequent operations, such as metadata acquisition, will fail.

What's next

After configuring the data source, go to Data Map to collect metadata. Once collected, you can view and govern the Paimon table assets in Data Map.