Configure a Paimon Catalog data source to let DataWorks discover and govern Paimon table metadata stored in Object Storage Service (OSS) — including self-declared catalogs that aren't managed by Data Lake Formation (DLF).
Many Flink-based workloads declare Paimon catalogs directly in the Flink engine, with metadata and data stored in OSS rather than DLF. These self-declared catalogs fall outside DataWorks' standard data source management. The Paimon Catalog data source fills this gap: it reads metadata from OSS-backed Paimon catalogs and surfaces those assets in Data Map for unified governance.
Limits
-
Network: Only serverless resource groups are supported.
-
Scope: This data source supports Collect Metadata and governance only. It does not support data integration sync tasks. To read from or write to Paimon tables in a sync task, use a DLF or OSS data source instead.
Add a Paimon Catalog data source
Prerequisites
Before you begin, ensure that you have:
-
An OSS bucket containing the Paimon catalog warehouse data
-
The full warehouse path, such as
oss://bucket/path/warehouse
Step 1: Open the Data Sources page
-
Log on to the DataWorks console and select the target region. In the left navigation pane, click Workspace, find the target workspace, and then click Manage in the Actions column.
-
On the Management Center page, click Data Sources in the left navigation pane.
Step 2: Create the data source
-
On the Data Sources page, click Add Data Source.
-
In the dialog box, search for and select Paimon Catalog.
Step 3: Configure parameters
Catalog connection parameters
| Parameter | Required | Description |
|---|---|---|
| Data Source Name | Yes | A custom identifier for this data source, such as paimon_finance. |
| Catalog | Yes | The name of the catalog as declared on the compute engine side, such as paimon-catalog. Set this name to be the same as the one on the compute engine side to ensure accurate metadata mapping. |
| MetaStore | Yes | The metastore type. Currently, only Filesystem is supported. |
| Filesystem | Yes | The file storage type. Currently, only OSS is supported. |
Storage access parameters
| Parameter | Required | Description |
|---|---|---|
| Access Mode | Yes | How DataWorks authenticates to OSS. See Access mode options below. |
| Region | Yes | The region of the OSS bucket. Select a bucket in the same region as the workspace to avoid cross-region latency. To connect to a bucket in a different region, establish a VPC peering connection or connect via a public endpoint. |
| Endpoint | Yes | The OSS domain name. For configuration guidance, see Overview of endpoints and network connectivity. |
| Warehouse | Yes | The full OSS path to the Paimon catalog warehouse, such as oss://bucket/path/warehouse. Click the folder icon to the right of the field to browse and select a path. An incorrect path causes metadata collection to fail. |
Access mode options
-
RAM Role Authorization Mode — DataWorks accesses OSS using a RAM role. Use this mode to control access through role-based permissions without embedding credentials. For setup instructions, see Configure a data source in RAM role authorization mode.
-
AccessKey Mode — DataWorks accesses OSS using the AccessKey of the currently logged-on account as the access identity.
Step 4: Test connectivity
After saving the configuration, test connectivity to confirm the data source can reach OSS through the serverless resource group.
-
Connected — the configuration is correct and DataWorks can reach the catalog.
-
Connection failed. — a diagnostic tool appears. Common causes include:
-
Incorrect AccessKey or RAM role configuration
-
Missing IP address whitelist entry for the resource group
-
Missing NAT Gateway for the resource group's network
-
What's next
After configuring the data source, go to Data Map to collect metadata. Once collected, you can view and govern the Paimon table assets in Data Map.