All Products
Search
Document Center

Dataphin:Create a Paimon data source

Last Updated:Oct 24, 2025

By creating a Paimon data source, you can enable Dataphin to read business data from Paimon or write data to Paimon. This topic describes how to create a Paimon data source.

Permission requirements

Only custom global roles with the Create Data Source permission and the super administrator, Data Source Administrator, Division Architect, and project administrator roles can create data sources.

Limits

  • Paimon data sources do not support access in the form of data source code and physical tables of computing sources.

  • Only HDFS storage is supported.

Procedure

  1. On the Dataphin homepage, click Management Center > Datasource Management in the top navigation bar.

  2. On the Datasource page, click +Create Data Source.

  3. In the Create Data Source page's Big Data section, select Paimon.

    If you have recently used Paimon, you can also select Paimon in the Recently Used section. You can also enter Paimon keywords in the search box to quickly search for it.

  4. On the Create Paimon Data Source page, configure the basic information of the data source.

    Parameter

    Description

    Datasource Name

    The name must meet the following requirements:

    • It can contain only Chinese characters, letters, digits, underscores (_), and hyphens (-).

    • It cannot exceed 64 characters in length.

    Datasource Code

    After you configure the data source code, you can reference tables in the data source in Flink SQL nodes using the format datasource_code.table_name or datasource_code.schema.table_name. To automatically access the data source of the corresponding environment, use the variable format ${datasource_code}.table or ${datasource_code}.schema.table. For more information, see Develop nodes that use Dataphin data source tables.

    Important
    • The data source code cannot be modified after it is configured.

    • You can preview data on the object details page in the asset directory and asset inventory only after the data source code is configured.

    • In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, and SelectDB data sources are supported.

    Version

    Select the version of the Paimon data source. The following versions are supported: Aliyun EMR 3.x Hive 2.3.5, Aliyun EMR 5.x Hive 3.1.x, CDH 6.x Hive 2.1.1, CDP 7.x Hive 3.1.3, and AsiaInfo DP 5.x Hive 3.1.0.

    Data Source Description

    A brief description of the data source. It cannot exceed 128 characters.

    Data Source Configuration

    Select the data source to be configured:

    • If the data source distinguishes between production and development data sources, select Production + Development Data Source.

    • If the data source does not distinguish between production and development data sources, select Production Data Source.

    Tag

    You can categorize and tag data sources based on tags. For information about how to create tags, see Manage data source tags.

  5. Configure the connection parameters between the data source and Dataphin.

    If you select Production + Development Data Source for Data Source Configuration, you need to configure the connection information for both Production + Development Data Source. If you select Production Data Source for Data Source Configuration, you only need to configure the connection information for the Production Data Source.

    Note

    Typically, production and development data sources should be configured as different data sources to achieve environment isolation between development and production data sources, reducing the impact of development data sources on production data sources. However, Dataphin also supports configuring them as the same data source with identical parameter values.

    Parameter

    Description

    Catalog Configuration

    Catalog Type

    Only Hive is supported and cannot be modified.

    Warehouse

    Enter the root storage path for Paimon tables.

    Enter the value of the fs.defaultFS parameter from core-site.xml and the value of the hive.metastore.warehouse.dir parameter from hive-site.xml.

    Note

    Object Storage Service (OSS) is not supported.

    Hive Thrift Uri

    Enter the parameter value of hive.metastore.uris in hive-site.xml.

    Metadata Configuration

    Metadata Retrieval Method

    Both Metadata Database and HMS methods are supported.

    • Metadata Database Method

      • Database Type: Only MySQL database type is supported, including versions MySQL5.1.43, MySQL5.6/5.7, and MySQL8.

      • JDBC URL: Enter the JDBC URL of the metadatabase. The connection format is jdbc:mysql://host:port/dbname.

      • Username, Password: Enter the username and password for accessing the metadata database.

    • HMS Method

      • Authentication Method: Supports No Authentication, LDAP, and Kerberos methods.

        Note

        When using the Kerberos method, you need to enable the Kerberos option in Cluster Configuration.

      • hive-site.xml: Upload the hive-site.xml configuration file. If real-time development is enabled, this file is also used.

      • Keytab File: For the Kerberos method, you need to upload the Keytab File.

      • Principal: For the Kerberos method, you need to enter the Principal parameter.

    Cluster Configuration

    NameNode

    Enter the NameNode address of the cluster.

    To add multiple NameNodes, click +Add to add them.

    Configuration File

    Upload the hdfs-site.xml and core-site.xml configuration files of the cluster.

    Kerberos

    To access the cluster through Kerberos, enable this configuration item and configure the following information.

    • Kerberos Configuration Method: Select the KDC Server configuration method for the cluster. Both KDC Server and krb5 file configuration are supported.

      • KDC Server: For the KDC Server configuration method, you need to enter the address of the KDC Server. Multiple configuration items are supported, separated by semicolons (;).

      • Krb5 File Configuration: For the krb5 file configuration method, you need to upload the krb5 configuration file.

    • HDFS Configuration: Enter the HDFS configuration information of the cluster.

      • HDFS Keytab File: Upload the HDFS Keytab File configuration file of the cluster.

      • HDFS Principal: Enter the principal for the cluster's Kerberos authentication, for example, XXXX/hadoopclient@xxx.xxx.

    Hive Configuration

    JDBC URL

    Enter the JDBC URL of Hive. The connection format is jdbc:hive2://host:port/dbname.

    Username, Password

    For non-Kerberos access to the cluster, enter the authentication username and password for Hive.

    Note

    To ensure normal task execution, make sure the user you enter has the required data permissions.

    Hive Keytab File

    For Kerberos access to the cluster, upload the Hive Keytab File configuration file for Hive.

    Hive Principal

    For Kerberos access to the cluster, enter the Kerberos authentication principal, such as XXXX/hadoopclient@xxx.xxx.

    Configuration File

    Upload the hive-site.xml configuration file for Hive.

    Important

    Flink SQL tasks will ignore the authentication information in the integration and use the authentication information of the Flink engine to access the Hive data source.

  6. Select Default Resource Group, which is used to run tasks related to the current data source, including database SQL, offline database migration, data preview, and more.

  7. Perform a Test Connection or directly click OK to save and complete the creation of the Paimon data source.

    Click Test Connection, and the system will test whether the data source can connect to Dataphin normally. If you directly click OK, the system will automatically test the connection for all selected clusters, but the data source can still be created normally even if all selected clusters fail to connect.