The Dataphin metadata warehouse is a data warehouse in Dataphin that centrally manages business metadata and the corresponding compute engine metadata. It resides in a Dataphin project within the metadata warehouse tenant (OPS tenant) and consists of periodic data integration nodes, SQL script nodes, and Shell nodes. Metadata warehouse initialization is the process of configuring the compute engine type for the Dataphin system and initializing the metadata. This topic describes how to use Transwarp TDH as the compute engine to initialize the metadata warehouse.
Prerequisites
To use Transwarp TDH as the metadata warehouse, you must enable access to the metadatabase or provide a Hive Metastore service for metadata retrieval.
If you use TDH Inceptor as the metadata warehouse, or if you use TDH Inceptor as the metadata retrieval method during initialization, the following conditions must be met:
A project named dataphin_meta is created in TDH Inceptor.
The user configured for TDH Inceptor during metadata warehouse initialization must have permissions to create and write to tables in the dataphin_meta project.
The compute engine account requires read permission for the materialized tables in the dataphin_meta project.

Background information
Dataphin supports metadata retrieval through a direct connection to the metadatabase or the Hive Metastore Service. The following table compares the advantages and disadvantages of each method.

Metadata retrieval method | Advantages and disadvantages |
Direct metadatabase connection | High performance: Directly connects to the underlying metadatabase. This bypasses the intermediate Hive Metastore Service (HMS) and improves client performance when retrieving metadata and reduces network latency. More open: When you query the metastore through the HMS service, you can only use the methods provided by the metastore client. A direct connection to the metadatabase lets you freely use SQL for queries. |
Hive Metastore Service | More secure: You can enable Kerberos authentication for the metastore. Clients must then use Kerberos authentication to read data from the metastore. More flexible: Clients are only aware of the HMS service, not the underlying metadatabase. This lets you switch the underlying database at any time without changing the client configuration. |
Limits
When you select MySQL metadatabase, Inceptor metadatabase, or HMS to retrieve metadata, some of the retrieved metadata may be missing or inaccurate.
When you retrieve metadata using MySQL metadatabase or HMS, the following information cannot be retrieved:
Data volume information for asset overview, data plates, and projects.
Table data volume, partition data volume, and number of partition records in the asset catalog.
Storage-related metrics for resource administration.
Data volume and number of records for dim_dataphin_table and dim_dataphin_partition in the metadata warehouse shared model.
When you retrieve metadata using the TDH Inceptor System library, the following information cannot be retrieved:
Number of partition records in the asset catalog.
Number of records for dim_dataphin_table and dim_dataphin_partition in the metadata warehouse shared model.
When you use TDH Inceptor as the compute engine for the metadata warehouse, Dataphin does not support user-defined functions (UDFs).
If you add a JAR package with a duplicate name for UDF registration, the Inceptor service may stop responding and fail to restart. If you add a JAR package that has a different name but contains identical class files, the UDF execution results may be unpredictable. Therefore, Dataphin does not support UDFs when TDH Inceptor is used as the compute engine for the metadata warehouse. To add UDFs, you must add them using the TDH Inceptor client. Ensure that the names of UDFs and their class names are unique within the cluster.
Permissions
Only accounts with the metadata warehouse tenant super administrator or system administrator role can initialize the system.
Keep the account and password for the metadata warehouse tenant super administrator or system administrator secure. Exercise caution when you perform operations while logged on as the metadata warehouse tenant super administrator.
Procedure
In the top menu bar of the Dataphin homepage, select Management Center > System Settings.
In the navigation pane, select System O&M > Metadata Warehouse Settings.
On the Metadata Deployment wizard, click Start.
On the Select Initialization Engine Type page, select the Transwarp TDH 6.x or Transwarp TDH 9.3.x engine type.
ImportantIf you switch to an incompatible compute engine, the administration feature is unavailable. If the metadata warehouse is already initialized, the engine that was used for the last successful initialization is selected by default.
Click Next.
On the Parameter Checking page, configure the following parameters. The parameters are the same for Transwarp TDH 6.x and Transwarp TDH 9.3.x.
Area
Parameter
Description
Cluster Configuration
NameNode
Manages the file system namespace and client access permissions in Hadoop Distributed File System (HDFS).
Click Add.
In the Add NameNode dialog box, enter the hostname and port number for the NameNode, and then click OK.
The system automatically generates the corresponding format. Example:
host=start,webUiPort=50070,ipcPort=8020.
Configuration File
Upload the cluster configuration file to configure cluster parameters. The system supports uploading cluster configuration files such as core-site.xml and hdfs-site.xml.
History Log
The log path for the cluster. Example:
tmp/hadoop-yarn/staging/history/done.Authentication Type
The authentication method. Supported methods are No Authentication and Kerberos. Kerberos is an identity authentication protocol based on symmetric key technology and is often used for authentication between cluster components. Enabling Kerberos improves cluster security.
If you select Kerberos, you must also configure the Kerberos Configuration Method and HDFS parameters.
Kerberos Configuration Method
KDC Server: Enter the unified service address of the Key Distribution Center (KDC) to assist with Kerberos authentication. You can enter multiple addresses separated by commas (,).
Krb5 File Configuration: Upload the krb5 file for Kerberos authentication.
HDFS Keytab File: Upload the HDFS keytab file.
HDFS Principal: Enter the principal name for Kerberos authentication. Example:
XXXX/hadoopclient@xxx.xxx.
Inceptor Configuration
JDBC URL
Enter the Java Database Connectivity (JDBC) URL to connect to Inceptor.
Authentication Type
The authentication method for Inceptor. Select from No Authentication, LDAP, or Kerberos based on your engine configuration.
No Authentication: No authentication is required. Configure the username and password to access Inceptor.
LDAP: Lightweight Directory Access Protocol (LDAP) authentication. Configure the username and password to access Inceptor.
Kerberos: The cluster authentication method must be Kerberos. For Kerberos tasks, you must upload a keytab file and configure the principal address.
Keytab File: Upload the keytab file for Kerberos authentication.
Principal: The principal name for Kerberos authentication.
Metadatabase Configuration
Metadata Retrieval Method
The method for retrieving metadata. Supported methods are direct metadatabase connection and Hive Metastore Service (HMS). If you use HMS with Kerberos authentication, upload the keytab file and enter the principal.
Keytab File: The keytab file for Hive metastore Kerberos authentication.
Principal: The principal for Hive metastore Kerberos authentication.
Database Type
This parameter is required when you use the direct metadatabase connection method. Supported databases are MySQL, PostgreSQL, and Inceptor.
MySQL: Supported versions are 5.1.43, 5.6/5.7, and 8.0.
Inceptor: Supports No Authentication, LDAP, and Kerberos authentication.
JDBC URL
Enter the JDBC connection address for the target database. Examples:
MySQL database connection address format:
jdbc:mysql://host:port/dbname.Inceptor database connection address format:
jdbc:hive2://host:port/dbname.Username, Password
The username and password for the target database. If the Inceptor authentication method is No Authentication, only the username is required. For Kerberos authentication, upload the keytab file and enter the principal.
Metadata Production Project
Meta Project
The logical project for metadata production and processing. Set this to dataphin_meta. Keep the name unchanged during reinitialization to prevent failures.
Click Test Connection. After the connection test is successful, click Next.
On the initialization page, click Start.
NoteSystem initialization takes about 15 minutes.
After a success message appears, click Finish to complete the configuration.
What to do next
After you initialize the system metadata, you can set the compute engine for the Dataphin instance. When the metadata warehouse engine is set to Transwarp TDH, you can set the business tenant engine to any engine type except MaxCompute. For more information, see Compute settings.