Add a Paimon data source - Dataphin

By creating a Paimon data source, you can enable Dataphin to read business data from Paimon or write data to Paimon. This topic describes how to create a Paimon data source.

Permission requirements

Only custom global roles with the Create Data Source permission and the super administrator, Data Source Administrator, Division Architect, and project administrator roles can create data sources.

Limits

Paimon data sources do not support access in the form of data source code and physical tables of computing sources.
Only HDFS storage is supported.

Procedure

On the Dataphin homepage, click Management Center > Datasource Management in the top navigation bar.
On the Datasource page, click +Create Data Source.
In the Create Data Source page's Big Data section, select Paimon.
If you have recently used Paimon, you can also select Paimon in the Recently Used section. You can also enter Paimon keywords in the search box to quickly search for it.

On the Create Paimon Data Source page, configure the basic information of the data source.

Parameter	Description
Datasource Name	The name must meet the following requirements: It can contain only Chinese characters, letters, digits, underscores (_), and hyphens (-). It cannot exceed 64 characters in length.
Datasource Code	After you configure the data source code, you can reference tables in the data source in Flink SQL nodes using the format `datasource_code.table_name` or `datasource_code.schema.table_name`. To automatically access the data source of the corresponding environment, use the variable format `${datasource_code}.table` or `${datasource_code}.schema.table`. For more information, see Develop nodes that use Dataphin data source tables. Important The data source code cannot be modified after it is configured. You can preview data on the object details page in the asset directory and asset inventory only after the data source code is configured. In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, and SelectDB data sources are supported.
Version	Select the version of the Paimon data source. The following versions are supported: Aliyun EMR 3.x Hive 2.3.5, Aliyun EMR 5.x Hive 3.1.x, CDH 6.x Hive 2.1.1, CDP 7.x Hive 3.1.3, and AsiaInfo DP 5.x Hive 3.1.0.
Data Source Description	A brief description of the data source. It cannot exceed 128 characters.
Data Source Configuration	Select the data source to be configured: If the data source distinguishes between production and development data sources, select Production + Development Data Source. If the data source does not distinguish between production and development data sources, select Production Data Source.
Tag	You can categorize and tag data sources based on tags. For information about how to create tags, see Manage data source tags.

Configure the connection parameters between the data source and Dataphin.

If you select Production + Development Data Source for Data Source Configuration, you need to configure the connection information for both Production + Development Data Source. If you select Production Data Source for Data Source Configuration, you only need to configure the connection information for the Production Data Source.

Note

Typically, production and development data sources should be configured as different data sources to achieve environment isolation between development and production data sources, reducing the impact of development data sources on production data sources. However, Dataphin also supports configuring them as the same data source with identical parameter values.

Parameter	Description
Catalog Configuration
Catalog Type	Only Hive is supported and cannot be modified.
Warehouse	Enter the root storage path for Paimon tables. Enter the value of the `fs.defaultFS` parameter from `core-site.xml` and the value of the `hive.metastore.warehouse.dir` parameter from `hive-site.xml`. Note Object Storage Service (OSS) is not supported.
Hive Thrift Uri	Enter the parameter value of hive.metastore.uris in hive-site.xml.
Metadata Configuration
Metadata Retrieval Method	Both Metadata Database and HMS methods are supported. Metadata Database Method Database Type: Only MySQL database type is supported, including versions MySQL5.1.43, MySQL5.6/5.7, and MySQL8. JDBC URL: Enter the JDBC URL of the metadatabase. The connection format is `jdbc:mysql://host:port/dbname`. Username, Password: Enter the username and password for accessing the metadata database. HMS Method Authentication Method: Supports No Authentication, LDAP, and Kerberos methods. Note When using the Kerberos method, you need to enable the Kerberos option in Cluster Configuration. hive-site.xml: Upload the `hive-site.xml` configuration file. If real-time development is enabled, this file is also used. Keytab File: For the Kerberos method, you need to upload the Keytab File. Principal: For the Kerberos method, you need to enter the Principal parameter.
Cluster Configuration
NameNode	Enter the NameNode address of the cluster. To add multiple NameNodes, click +Add to add them.
Configuration File	Upload the hdfs-site.xml and core-site.xml configuration files of the cluster.
Kerberos	To access the cluster through Kerberos, enable this configuration item and configure the following information. Kerberos Configuration Method: Select the KDC Server configuration method for the cluster. Both KDC Server and krb5 file configuration are supported. KDC Server: For the KDC Server configuration method, you need to enter the address of the KDC Server. Multiple configuration items are supported, separated by semicolons (;). Krb5 File Configuration: For the krb5 file configuration method, you need to upload the krb5 configuration file. HDFS Configuration: Enter the HDFS configuration information of the cluster. HDFS Keytab File: Upload the HDFS Keytab File configuration file of the cluster. HDFS Principal: Enter the principal for the cluster's Kerberos authentication, for example, `XXXX/hadoopclient@xxx.xxx`.
Hive Configuration
JDBC URL	Enter the JDBC URL of Hive. The connection format is `jdbc:hive2://host:port/dbname`.
Username, Password	For non-Kerberos access to the cluster, enter the authentication username and password for Hive. Note To ensure normal task execution, make sure the user you enter has the required data permissions.
Hive Keytab File	For Kerberos access to the cluster, upload the Hive Keytab File configuration file for Hive.
Hive Principal	For Kerberos access to the cluster, enter the Kerberos authentication principal, such as `XXXX/hadoopclient@xxx.xxx`.
Configuration File	Upload the hive-site.xml configuration file for Hive. Important Flink SQL tasks will ignore the authentication information in the integration and use the authentication information of the Flink engine to access the Hive data source.

Select Default Resource Group, which is used to run tasks related to the current data source, including database SQL, offline database migration, data preview, and more.
Perform a Test Connection or directly click OK to save and complete the creation of the Paimon data source.
Click Test Connection, and the system will test whether the data source can connect to Dataphin normally. If you directly click OK, the system will automatically test the connection for all selected clusters, but the data source can still be created normally even if all selected clusters fail to connect.