All Products
Search
Document Center

Dataphin:Initialize the metadata warehouse using Transwarp TDH as the compute engine

Last Updated:Nov 18, 2025

The Dataphin metadata warehouse is a data warehouse in Dataphin that centrally manages business metadata and the corresponding compute engine metadata. It resides in a Dataphin project within the metadata warehouse tenant (OPS tenant) and consists of periodic data integration nodes, SQL script nodes, and Shell nodes. Metadata warehouse initialization is the process of configuring the compute engine type for the Dataphin system and initializing the metadata. This topic describes how to use Transwarp TDH as the compute engine to initialize the metadata warehouse.

Prerequisites

  • To use Transwarp TDH as the metadata warehouse, you must enable access to the metadatabase or provide a Hive Metastore service for metadata retrieval.

  • If you use TDH Inceptor as the metadata warehouse, or if you use TDH Inceptor as the metadata retrieval method during initialization, the following conditions must be met:

    • A project named dataphin_meta is created in TDH Inceptor.

    • The user configured for TDH Inceptor during metadata warehouse initialization must have permissions to create and write to tables in the dataphin_meta project.

    • The compute engine account requires read permission for the materialized tables in the dataphin_meta project.

      image

Background information

Dataphin supports metadata retrieval through a direct connection to the metadatabase or the Hive Metastore Service. The following table compares the advantages and disadvantages of each method.

image

Metadata retrieval method

Advantages and disadvantages

Direct metadatabase connection

High performance: Directly connects to the underlying metadatabase. This bypasses the intermediate Hive Metastore Service (HMS) and improves client performance when retrieving metadata and reduces network latency.

More open: When you query the metastore through the HMS service, you can only use the methods provided by the metastore client. A direct connection to the metadatabase lets you freely use SQL for queries.

Hive Metastore Service

More secure: You can enable Kerberos authentication for the metastore. Clients must then use Kerberos authentication to read data from the metastore.

More flexible: Clients are only aware of the HMS service, not the underlying metadatabase. This lets you switch the underlying database at any time without changing the client configuration.

Limits

  • When you select MySQL metadatabase, Inceptor metadatabase, or HMS to retrieve metadata, some of the retrieved metadata may be missing or inaccurate.

    • When you retrieve metadata using MySQL metadatabase or HMS, the following information cannot be retrieved:

      • Data volume information for asset overview, data plates, and projects.

      • Table data volume, partition data volume, and number of partition records in the asset catalog.

      • Storage-related metrics for resource administration.

      • Data volume and number of records for dim_dataphin_table and dim_dataphin_partition in the metadata warehouse shared model.

    • When you retrieve metadata using the TDH Inceptor System library, the following information cannot be retrieved:

      • Number of partition records in the asset catalog.

      • Number of records for dim_dataphin_table and dim_dataphin_partition in the metadata warehouse shared model.

  • When you use TDH Inceptor as the compute engine for the metadata warehouse, Dataphin does not support user-defined functions (UDFs).

    If you add a JAR package with a duplicate name for UDF registration, the Inceptor service may stop responding and fail to restart. If you add a JAR package that has a different name but contains identical class files, the UDF execution results may be unpredictable. Therefore, Dataphin does not support UDFs when TDH Inceptor is used as the compute engine for the metadata warehouse. To add UDFs, you must add them using the TDH Inceptor client. Ensure that the names of UDFs and their class names are unique within the cluster.

Permissions

Only accounts with the metadata warehouse tenant super administrator or system administrator role can initialize the system.

Important

Keep the account and password for the metadata warehouse tenant super administrator or system administrator secure. Exercise caution when you perform operations while logged on as the metadata warehouse tenant super administrator.

Procedure

  1. In the top menu bar of the Dataphin homepage, select Management Center > System Settings.

  2. In the navigation pane, select System O&M > Metadata Warehouse Settings.

  3. On the Metadata Deployment wizard, click Start.

  4. On the Select Initialization Engine Type page, select the Transwarp TDH 6.x or Transwarp TDH 9.3.x engine type.

    Important

    If you switch to an incompatible compute engine, the administration feature is unavailable. If the metadata warehouse is already initialized, the engine that was used for the last successful initialization is selected by default.

  5. Click Next.

  6. On the Parameter Checking page, configure the following parameters. The parameters are the same for Transwarp TDH 6.x and Transwarp TDH 9.3.x.

    Area

    Parameter

    Description

    Cluster Configuration

    NameNode

    Manages the file system namespace and client access permissions in Hadoop Distributed File System (HDFS).

    1. Click Add.

    2. In the Add NameNode dialog box, enter the hostname and port number for the NameNode, and then click OK.

      The system automatically generates the corresponding format. Example: host=start,webUiPort=50070,ipcPort=8020.

    Configuration File

    Upload the cluster configuration file to configure cluster parameters. The system supports uploading cluster configuration files such as core-site.xml and hdfs-site.xml.

    History Log

    The log path for the cluster. Example: tmp/hadoop-yarn/staging/history/done.

    Authentication Type

    The authentication method. Supported methods are No Authentication and Kerberos. Kerberos is an identity authentication protocol based on symmetric key technology and is often used for authentication between cluster components. Enabling Kerberos improves cluster security.

    If you select Kerberos, you must also configure the Kerberos Configuration Method and HDFS parameters.

    • Kerberos Configuration Method

      • KDC Server: Enter the unified service address of the Key Distribution Center (KDC) to assist with Kerberos authentication. You can enter multiple addresses separated by commas (,).

      • Krb5 File Configuration: Upload the krb5 file for Kerberos authentication.

    • HDFS Keytab File: Upload the HDFS keytab file.

    • HDFS Principal: Enter the principal name for Kerberos authentication. Example: XXXX/hadoopclient@xxx.xxx.

    Inceptor Configuration

    JDBC URL

    Enter the Java Database Connectivity (JDBC) URL to connect to Inceptor.

    Authentication Type

    The authentication method for Inceptor. Select from No Authentication, LDAP, or Kerberos based on your engine configuration.

    • No Authentication: No authentication is required. Configure the username and password to access Inceptor.

    • LDAP: Lightweight Directory Access Protocol (LDAP) authentication. Configure the username and password to access Inceptor.

    • Kerberos: The cluster authentication method must be Kerberos. For Kerberos tasks, you must upload a keytab file and configure the principal address.

      • Keytab File: Upload the keytab file for Kerberos authentication.

      • Principal: The principal name for Kerberos authentication.

    Metadatabase Configuration

    Metadata Retrieval Method

    The method for retrieving metadata. Supported methods are direct metadatabase connection and Hive Metastore Service (HMS). If you use HMS with Kerberos authentication, upload the keytab file and enter the principal.

    • Keytab File: The keytab file for Hive metastore Kerberos authentication.

    • Principal: The principal for Hive metastore Kerberos authentication.

    Database Type

    This parameter is required when you use the direct metadatabase connection method. Supported databases are MySQL, PostgreSQL, and Inceptor.

    MySQL: Supported versions are 5.1.43, 5.6/5.7, and 8.0.

    Inceptor: Supports No Authentication, LDAP, and Kerberos authentication.

    JDBC URL

    Enter the JDBC connection address for the target database. Examples:

    MySQL database connection address format: jdbc:mysql://host:port/dbname.

    Inceptor database connection address format: jdbc:hive2://host:port/dbname.

    Username, Password

    The username and password for the target database. If the Inceptor authentication method is No Authentication, only the username is required. For Kerberos authentication, upload the keytab file and enter the principal.

    Metadata Production Project

    Meta Project

    The logical project for metadata production and processing. Set this to dataphin_meta. Keep the name unchanged during reinitialization to prevent failures.

  7. Click Test Connection. After the connection test is successful, click Next.

  8. On the initialization page, click Start.

    Note

    System initialization takes about 15 minutes.

  9. After a success message appears, click Finish to complete the configuration.

What to do next

After you initialize the system metadata, you can set the compute engine for the Dataphin instance. When the metadata warehouse engine is set to Transwarp TDH, you can set the business tenant engine to any engine type except MaxCompute. For more information, see Compute settings.