All Products
Search
Document Center

Dataphin:Metadata center settings

Last Updated:Jan 05, 2026

All metadata acquisition tasks for tenants run in the metadata warehouse tenant. Before you can use the Metadata Center feature, you must complete the initial setup in the metadata warehouse tenant. This setup specifies the compute source for running metadata acquisition tasks. This topic describes how to configure the Metadata Center.

Limits

  • The compute engine type for the Metadata Center must match the engine type of the metadata warehouse.

  • The Metadata Center feature supports the following compute engines: MaxCompute, E-MapReduce 5.x Hadoop, E-MapReduce 3.x Hadoop, CDH 5.x Hadoop, CDH 6.x Hadoop, Cloudera Data Platform 7.x, Huawei FusionInsight 8.x Hadoop, and AsiaInfo DP5.3 Hadoop.

  • After the Metadata Center is initialized, you cannot reinitialize it.

Permissions

A super administrator or system administrator of the metadata warehouse tenant can initialize the Metadata Center.

Glossary

  • Metadata: Data about data, including technical, business, and management metadata. It describes the characteristics, source, format, and relationships of data to help you retrieve, use, and maintain the data.

  • Metadata Center: A system that extracts, processes, centrally stores, and manages metadata from various business systems to support data governance and improve data organization, retrieval, and analysis within an organization.

Initialize the Metadata Center

  1. Log on to the metadata warehouse tenant as a super administrator or system administrator.

  2. On the Dataphin homepage, in the top menu bar, choose Management Center > System Settings.

  3. In the navigation pane on the left, under System O&M, click Metadata Center Settings to open the Metadata Center Initialization Configuration page.

  4. Based on the compute engine of the metadata warehouse, select a compute source type for the Metadata Center initialization. The supported engines are MaxCompute and Hadoop.

    MaxCompute

    Parameter

    Description

    Compute source type

    Select the MaxCompute compute engine.

    Endpoint

    Configure the endpoint for the MaxCompute region where the Dataphin instance is located. For details about MaxCompute endpoints for different regions and network types, see MaxCompute Endpoint.

    Project Name

    This is the name of the MaxCompute project, not the DataWorks workspace.

    Log on to the MaxCompute console. In the upper-left corner, switch the region. You can find the project name on the project management tab.

    AccessKey ID, AccessKey secret

    Enter the AccessKey ID and AccessKey secret of an account that has permissions to access the MaxCompute project.

    Use an existing AccessKey, or refer to create an AccessKey to create a new one.

    Note

    To reduce the risk of a leak, the AccessKey secret is displayed only when you create it and cannot be retrieved later. Make sure to store it securely.

    • To ensure a stable connection between the Dataphin project and the MaxCompute project, use the AccessKey pair of a MaxCompute project administrator.

    • To ensure successful metadata acquisition, do not change the AccessKey pair for the MaxCompute project.

    Hadoop

    • Compute source type:

      • HDFS cluster storage: This option supports the E-MapReduce 5.x Hadoop, E-MapReduce 3.x Hadoop, CDH 5.x Hadoop, CDH 6.x Hadoop, Cloudera Data Platform 7.x, Huawei FusionInsight 8.x Hadoop, and AsiaInfo DP5.3 Hadoop compute engines.

      • OSS-HDFS cluster storage: This option supports only the E-MapReduce 5.x Hadoop compute engine.

    • Cluster configuration

      HDFS cluster storage

      Parameter

      Description

      NameNode

      The NameNode manages the file system namespace in HDFS and access permissions for external clients.

      1. Click Add.

      2. In the Add NameNode dialog box, enter the hostname and port number of the NameNode, and then click OK.

        After you enter the information, the system automatically generates the configuration in the required format, such as host=hostname,webUiPort=50070,ipcPort=8020.

      Configuration File

      • Upload cluster configuration files to configure cluster parameters. The system supports files such as core-site.xml and hdfs-site.xml.

      • If you use the HMS method to retrieve metadata, you must upload the hdfs-site.xml, hive-site.xml, core-site.xml, and hivemetastore-site.xml files. If the compute engine is FusionInsight 8.X or E-MapReduce 5.x Hadoop, you must also upload the hivemetastore-site.xml file.

      History Log

      Configure the log path for the cluster. Example: tmp/hadoop-yarn/staging/history/done.

      Authentication Type

      Supports No Authentication and Kerberos authentication. Kerberos is an identity authentication protocol that uses symmetric key technology. It is often used for authentication between cluster components. Enabling Kerberos improves cluster security.

      If you enable Kerberos authentication, configure the following parameters:

      • Kerberos configuration method

        • KDC Server: Enter the unified service address of the Key Distribution Center (KDC) to assist with Kerberos authentication.

        • krb5 file configuration: Upload the krb5 file for Kerberos authentication.

      • HDFS configuration

        • HDFS Keytab File: Upload the HDFS keytab file.

        • HDFS Principal: Enter the principal for Kerberos authentication. Example: XXXX/hadoopclient@xxx.xxx.

      OSS-HDFS cluster storage

      Parameter

      Description

      Cluster storage

      You can check the cluster storage class in the following ways:

      • If you have not created a cluster: You can view the created Hadoop storage cluster pass the E-MapReduce5.x cluster type creation page.

      • After Cluster Creation: You can view the cluster storage type created pass the details page of the E-MapReduce5.x Hadoop cluster.

      Cluster storage root directory

      Fill in the cluster store root catalog. Can be obtained pass viewing E-MapReduce5.x Hadoop cluster information.

      Important

      If the path that you enter includes an Endpoint, Dataphin uses that Endpoint by default. If the path does not include an Endpoint, the bucket-level Endpoint configured in core-site.xml is used. If a bucket-level Endpoint is not configured, the global Endpoint in core-site.xml is used. For more information, see Alibaba Cloud OSS-HDFS Service (JindoFS Service) Endpoint Configuration.

      Configuration File

      Upload cluster configuration files to configure cluster parameters. The system supports files such as core-site.xml and hive-site.xml. If you use the HMS method to retrieve metadata, you must upload the hive-site.xml, core-site.xml, and hivemetastore-site.xml files.

      History Log

      Configure the log path for the cluster. Example: tmp/hadoop-yarn/staging/history/done.

      AccessKey ID, AccessKey secret

      Enter the AccessKey ID and AccessKey secret to access OSS. Use an existing AccessKey or refer to create an AccessKey to create a new one.

      Note

      To reduce the risk of a leak, the AccessKey secret is displayed only when you create it and cannot be retrieved later. Make sure to store it securely.

      Important

      The AccessKey pair that you configure here has a higher priority than the AccessKey pair configured in the core-site.xml file.

      Authentication Type

      Supports No Authentication and Kerberos authentication. Kerberos is an identity authentication protocol that uses symmetric key technology. It is often used for authentication between cluster components. Enabling Kerberos improves cluster security. If you enable Kerberos authentication, you must upload the krb5 file.

    • Hive configuration

      Parameter

      Description

      JDBC URL

      Enter the Java Database Connectivity (JDBC) URL for connecting to Hive.

      Authentication Type

      If you set the cluster authentication method to No Authentication, you can set the Hive authentication method to No Authentication or LDAP.

      If you set the cluster authentication method to Kerberos, you can set the Hive authentication method to No Authentication, LDAP, or Kerberos.

      Note

      You can configure the authentication method if the compute engine is E-MapReduce 3.x, E-MapReduce 5.x, Cloudera Data Platform 7.x, AsiaInfo DP5.3, or Huawei FusionInsight 8.X.

      Username, Password

      The username and password for accessing Hive.

      • No Authentication: Enter a username.

      • LDAP Authentication: Enter a username and password.

      • Kerberos Authentication: These fields are not required.

      Hive Keytab File

      This parameter is required if you enable Kerberos authentication.

      Upload the keytab file. You can obtain the keytab file from the Hive server.

      Hive Principal

      This parameter is required if you enable Kerberos authentication.

      Enter the Kerberos authentication principal that corresponds to the Hive keytab file. Example: XXXX/hadoopclient@xxx.xxx.

      Execution engine

      Select an appropriate execution engine as needed. The supported execution engines vary based on the compute engine.

      • E-MapReduce 3.X: MapReduce, Spark.

      • E-MapReduce 5.X: MapReduce, Tez.

      • CDH 5.X: MapReduce.

      • CDH 6.X: MapReduce, Spark, Tez.

      • FusionInsight 8.X: MapReduce.

      • AsiaInfo DP 5.3 Hadoop: MapReduce.

      • Cloudera Data Platform 7.x: Tez.

      Note

      After you set the execution engine, the compute settings, compute source, and nodes in the metadata warehouse tenant use the specified Hive execution engine. If you reinitialize the settings, these components are reset to use the newly specified execution engine.

    • Metadata retrieval method

      You can retrieve metadata using the metadatabase or Hive Metastore Service (HMS). The required configuration depends on the method you select.

      • Retrieve metadata from a metadatabase

        Parameter

        Description

        Database type

        Only MySQL is supported as the database type for the Hive metadatabase.

        Supported MySQL versions: MySQL 5.1.43, MYSQL 5.6/5.7, and MySQL 8.

        JDBC URL

        Enter the JDBC URL of the destination database. Example: jdbc:mysql://host:port/dbname.

        Username, Password

        The username and password of the destination database.

      • Obtaining HMS

        If you use the HMS method and Kerberos is enabled, you must upload the keytab file and specify the principal.

        Parameter

        Description

        Keytab File

        The Kerberos authentication keytab file for the Hive metastore.

        Principal

        The Kerberos authentication principal for the Hive metastore.

  5. After you configure the required parameters, click Test Connection to verify the connection to Dataphin.

  6. After the connection test passes, click OK and Start Initialization. The system then checks for the required permissions and the metadata warehouse initialization configuration.

    Permissions: Checks whether the current user is a super administrator or system administrator of the metadata warehouse tenant.

    Metadata warehouse initialization configuration: Checks whether the metadata warehouse is successfully initialized.

  7. After the checks pass, the initialization process starts. This process creates the compute source, project, and data source, and then runs the initial DDL statements. After the process is complete, the Metadata Center is initialized.

References

After you initialize the Metadata Center, you can collect metadata from databases and import it into Dataphin for analysis and management. For more information, see Create and manage metadata acquisition tasks.